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CLONING, EXPRESSION AND DIAGNOSIS OF . 
HUMAN CYTOCHROME P450 2 CI 9 : 
THE PRINCIPAL DETERMINANT OF S-MEPHENYTOIN METABOLISM 



TECHNICAL FIELD 
The present invention relates generally to isolation 
and exploitation of a novel member of the cytochrome P450 2C 
20 subfamily of enzymes 2C19 , which is shown to be the principal 
human determinant of human S-mephenytoin metabolism* The 
invention also relates to the isolation and exploitation of an 
additional member of this family designated 2C18 . 

25 BACKGROUND OF THE INVENTION 

The cytochromes P450 are a large family of 
hemoprotein enzymes capable of metabolizing xenobiotics such 
as drugs, carcinogens and environmental pollutants as well as 
endobiotics such as steroids, fatty acids and prostaglandins. 

30 Some members of the cytochrome P450 family are inducible in 
both animals and cultured cells , while other forms are 
constitutive. This group of enzymes has both harmful and 
beneficial activities. Metabolic conversion of xenobiotics to 
toxic, mutagenic and carcinogenic forms is a harmful activity. 

35 Detoxification of some drugs and other xenobiotic substances 
is a beneficial activity (Gelboin, Physiol. Rev. 60:1107-1). 
A further beneficial activity is the metabolic processing of 
some drugs to activated forms that have pharmacological 
activity. 

4 0 Genetic polymorphisms of P4 50 enzymes result in 

phenotypically-distinct subpopulations that differ in their 
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ability to perform particular drug biotransformation 
reactions . These phenotypic distinctions have important 
. implications for selection of drugs. For example, a drug that 
is safe when administered to most human may cause intolerable 
5 side-effects in an individual suffering from a defect in a 
P450 enzyme required for detoxification of the drug. 
Alternatively, a drug that is effective in most humans may be 
ineffective in a particular subpopulation because of lack of a 
P4 50 enzyme required for conversion of the drug to a 

10 metabolically active form. Accordingly, it is important for 
both drug development and clinical use to screen drugs to 
determine which P4 50 enzymes are required for activation 
and/or detoxification of the drug. It is also important to 
identify individuals who are deficient in a particular P450 

15 enzyme. 

A cytochrome P450 polymorphism of particular concern 
results in reduced levels of S-mephenytoin 4 • -hydroxylase 
activity in certain subpopulations . (Kiipfer et al. , Eur. J. 
Clin. Pharmacol. 26:753-759 (1984); Wedlund et al., Clin. 

20 Pharmacol, Ther . 36:773-780 (1984), Two phenotypes, extensive 
and poor metabolizers , are present in the human population. 
Poor metabolizers are detected at low frequencies in 
Caucasians (2-5%) but at higher frequencies in the Oriental 
population (-20%) (Nakamura et al. , Clin. Pharmacol. Ther. 

25 38:402-408 (1985); Jurima et al, , Br. J. Clin. Pharmacol. 

19:483-487 (1985) and blacks ("12%). 4 • -hydroxy 1 at ion of S- 
mephenytoin is 3-10 fold higher than that of the R- enantiomer 
in extensive metabolizers, but the ratio is approximately l or 
less in poor metabolizers (Yasumori et al. ,.Mol. Pharmacol. 

30 35:443-449 (1990). Rates of S-mephenytoin 4 1 -hydroxy lat ion in 
liver microsomes are also much higher than those of R- 
mephenytoin in extensive metabolizers. 

There is some evidence that S-mephenytoin 4 9 
hydroxylase activity resides in the cytochrome P4 50 2C family 

35 of enzymes. A number of 2C human variants (designated 2C8 , 
2C9 and 2C10) have been partially purified, and/or cloned. 
See Shimada et al., J. Biol. Chem. 261:909-921 (1986); Kawano 
et al., J. Biochem. (Tokyo) 102:493-501 (1987); Gut et al, , 
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Biochem. Biophys , Acta 884:435-447 (1986); Beaune et al. , 
Biochem Biophys. Acta 840:364-370 (1985); Ged et al., 
Biochemistry 27 : 6929-6940 (1988)); Umbenhauer et al., 
Biochemistry 26 , 1094-1099 (1987); Kimura et al. , Nucleic 
5. Acids Res. 15:10053-10054 (1987); Shephard et al., Ann. Humn. 

Gentc. 53:23-31 (1989); Yasumori et al., J. Biochem. 102:1075- 
1082 (1987); Relling et al^ , J. Pharmacol. Ther. 252:442-447. 
A comparison of the P450 2C cDNAs and their predicted amino 
acid sequences shows that about 7 0% of the amino acids are 

10 absolutely conserved among the human P4 50 2C subfamily. Some 
regions of human P4 50 2C protein sequences have particularly 
highly conservation, and these regions may participate in 
common P450 functions. Other regions show greater sequence 
divergence regions and are likely responsible for different 

15 substrate specificities between 2C members. 

There has been considerable controversy as to 
whether any of the Jcnown 2C members encodes the principal 
human determinant of S-mephenytoin 4* hydroxylase activity, in 
which the polymorphism discussed above presumably resides. 

20 The multiplicity and common properties of cytochromes P450 
make it difficult to separate their different forms, 
especially the minor forms. Even in situations where P450 
cytochromes have been isolated in purified form by 
conventional enzyme purification procedures, they have been 

25 removed from the natural biological membrane association and 
therefore require the addition of NADPH-cytochrome P4 50 
reductase and other cell fractions for enzymatic activity. 

The known members of the cytochrome P4 50 2C family 
exhibit only low-levels of S-mephenytoin 4 * -hydroxylase 

30 activity, if any. Moreover, such low levels of activity are 

not specific for the S-enantiomer . For example, when the cDNA 
isolated by Kimura et al. (1987) , supra, was expressed in 
HepG2 cells, it metabolized racemic and (R) -mephenytoin but 
had no (S) -mephenytoin hydroxylase activity, suggesting that 

35 the polymorphism in the metabolism of (S) -mephenytoin resides 
in a different member of the P450 family. As a further 
example, Yasumori et al. (1991), supra, reported that an 
allelic variant of 2C9 (Arg 144 Tyr 358 Iso 359 Gly 417 ) showed a low- 
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level of catalytic activity toward S-mephenytoin in a cDNA- 
directed yeast expression. However, Srivastava et al. , Mol . 
Pharmacol, 40:69-69 (1991) expressed an identical cDNA in 
yeast and a Arg 144 Cys 358 Iso 359 Asp 417 variant (2C10 by present 
5 nomenclature) but were unable to demonstrate catalytic 

activity of 2C9 or 2C10 toward S-mephenytoin. Relling et al. , 
J. Pharmacol. Exper. Ther. 252:442-447 (1990), were also 
unable to demonstrate catalytic activity of an allelic variant 
of Cys 144 Tyr 358 Ile 359 Gly 417 -2C9 toward S-mephenytoin using a 

10 retroviral cDNA expression system in HepG2 cells. In 

contrast, all of these 2C9 variants metabolized tolbutamide in 
the various expression systems confirming that failure to 
observe S-mephenytoin 4 1 -hydroxylase activity was not due to 
deficiencies in the expression system. 

15 Based on the foregoing, it is apparent that a need 

exists to identify and isolate the P4 50 2C family member 
representing the principal determinant of S-mephenytoin 4 f - 
hydroxylase activity in humans. There is also a need for 
stable cell lines expressing the S-mephenytoin 4 ' -hydroxylase 

20 activity. A need is also apparent for methods of screening 
drugs for safety and efficacy in individuals deficient in S- 
mephenytoin 4 1 -hydroxylase activity. There is also a need for 
methods for diagnosing individuals deficient in S-mephenytoin 
4 '-hydroxylase activity. The present invention fulfills these 

25 and other needs. 

SUMMARY OF THE INVENTION 
The invention provides purified cytochrome P450 2C19 
polypeptides. The amino acid sequence of ari exemplary P450 

30 2C19 polypeptide is designated SEQ. ID. No. 1. Other 

cytochrome P4 50 2C19 polypeptides usually comprises an amino 
acid sequence having at least 97% sequence identity with the 
exemplified sequence. Many of the 2C19 polypeptides of the 
invention exhibit stereospecif ic S-mephenytoin 4 1 -hydroxylase 

35 activity. The activity is typically at least about 1 nmol 

mephenytoin per nmol of the purified polypeptide per minute. 
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The invention also provides purified cytochrome P4 50 
2C18 polypeptides. The amino acid sequences of exemplary 2C18 
polypeptides are designated SEQ. ID. Nos. 5 and 11. 

In another aspect of the invention, purified DNA 
5 segments encoding the P4 50 2C19 polypeptides described above 
are provided. Some DNA segments encode the exemplary P450 
2C19 having the amino acid sequenced designated SEQ. ID. 
No. 1. One such exemplary DNA segment is designated SEQ, ID, 
No. 2. Other DNA segments encode the P450 2C18 polypeptides 

10 described above. Exemplary DNA segments are designated SEQ; 
ID. Nos. 6 and 12. 

In a further aspect of the invention stable cell 
lines are provided- The cell lines comprise an exogenous DNA 
segment encoding a cytochrome P4 50 2C19 polypeptide having at 

15 least 97% sequence identity with the amino acid sequence 
designated SEQ. ID, No. 1. The DNA segment is capable of 
being expressed in the cell line. Cell lines preferably 
produce high levels of the P4 50 2C19 polypeptide such as 10- 
200 pmol of the polypeptide per mg of total microsomal 

20 protein. Preferred cell lines are euXaryotic, including yeast 
and insect cells. 

The invention also provides methods of producing a 
cytochrome P450 2C19 polypeptide. In these methods , a stable 
cell line, as described above, is cultured under conditions 

25 such that the DNA segment contained in the cell line is 
expressed. 

The invention also provides antibodies that 
specifically bind to a 2C19 polypeptide comprising the amino 
acid sequence designated SEQ. ID. No. l. Preferred antibodies 

30 are incapable of binding to nonallelic forms of 2C 
polypeptides, such as 2C9. 

In another aspect, the invention provides methods of 
screening for a drug that is metabolized by S-mephenytoin 4'- 
hydroxylase activity. The drug is contacted with a cytochrome 

3 5 P4 50 2C19 polypeptide. A metabolic product resulting from an 
interaction between the polypeptide is detected. The presence 
of the product indicates that the drug is metabolized by the 
S-mephenytoin 4 • -hydroxylase activity. The cytochrome P450 



8NS0OCJD: < WO„ ,„ 9530 756A I l.> 



WO 95/30766 



PCTAJS95/05744 



6 

2C19 used in the methods may be substantially pure or may be a 
component of a lysate of a stable cell line. The cytochrome 
P4 50 2C19 polypeptide may also be a component of an intact 
stable cell line. Some methods further comprise the steps of 
5 contacting the drug with a liver extract comprising a mixture 
of cytochrome P4 50 polypeptides, and detecting a metabolic 
product resulting from an interaction between the drug and the 
mixture of cytochrome P450 polypeptides. 

The invention also provides methods of identifying a 

10 mutagenic , carcinogenic or cytotoxic compound. In some 

methods, the compound is contacted with a stable cell line 
capable of expressing a 2C19 polypeptide, such as described 
above. Mutagenic, carcinogenic or cytotoxic effects of the 
compound on the cell line are assayed. In other methods, the 

15 compound is contacted with a cytochrome P450 2C19 polypeptide 
in a reaction mixture. A metabolic product is generated 
resulting from S-mephenytoin 4 '-hydroxylase activity on the 
compound. The metabolic product is assayed for mutagenic, 
carcinogenic or cytotoxic effects on a test cell line. The 

20 effects indicate that the compound is mutagenic, carcinogenic 
or cytotoxic. In some methods, the test cell line is added to 
the reaction mixture before, during or after the contacting 
step. The 2C19 polypeptide used in these methods can be 
substantially pure or a component of a lysate of a stable cell 

25 line. The 2C19 polypeptide can also be a component of an 
intact stable cell line. Salmonella typhimurlum is a 
preferred cell line. 

The invention also provides methods for testing the 
chemopreventive activity of an agent. A stable cell line 

30 capable of expressing a 2C19 polypeptide, such as described 
above, is contacted with an agent suspected of being 
chemopreventive in the presence of a carcinogen. The agent 
can be contacted with the cell line before addition of the 
carcinogen. Effects of the agent on the cell line that are 

3 5 indicative of chemopreventive activity are monitored. 

The invention also provides methods for determining 
the metabolites activated by a carcinogenic or xenobiotic. A 
stable cell line capable of expressing a 2C19 polypeptide, 
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such as described above, is contacted with the suspected 
carcinogen or xenobiotic. Metabolites and/or their effects 
are identified. 

The invention also provides methods of detecting a 
cytochrome 2C19 polypeptide in a tissue sample. The tissue 
sample is contacted with an antibody that specifically binds 
to the 2C19 polypeptide preferably without specifically 
binding to nonallelic variants such as 2C9 . Specific binding 
between the antibody and the polypeptide is detected to 
indicate the presence of the polypeptide. 

In another aspect of the invention, methods of 
diagnosing a patient having a deficiency in S-mephenytoin 4'- 
hydroxylase activity are provided. In these methods, a sample 
of nucleic acids is obtained from the patient, and 
a cytochrome P450 2C19 DNA sequence from the nucleic acids in 
the sample is analyzed for the presence of a polymorphism 
indicative of the deficiency. The most frequently occurring 
polymorphisms in the P450 2C19 genes occur at nucleotides 681 
and 636 of the 2C19 gene. 

In some methods, the P450 2C19 DNA sequence subject 
to analysis is genomic. In such methods, an amplifying step 
is often primed from a forward primer sufficiently 
complementary with a first subsequence of the antisense strand 
of the 2C19 sequence to hybridize therewith, and a reverse 
primer sufficiently complementary to a second subsequence of 
the sense strand of the 2C19 sequence to hybridize therewith. 

Some methods detect a polymorphism at nucleotide 681 
of the coding region of the P450 2C19 DNA genomic sequence. 
This can be achieved by selecting a forward primer that 
hybridizes upstream from nucleotide 681 of the coding region, 
and a reverse primer that hybridizes downstream from 
nucleotide 681 of the coding region. Amplification products 
generated from these primers can be analyzed by digesting the 
amplified DNA segment with a restriction enzymes that 
recognizes a site that includes nucleotide 681 of the coding 
region. 

Other methods detect a polymorphism at nucleotide 
63 6 of the coding region of the P450 2C19 DNA genomic 
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sequence. This can be achieved using a forward primer that 
hybridizes upstream from nucleotide 63 6 of the coding region, 
and a reverse primer that hybridizes downstream of nucleotide 
636 of the coding region. Amplification products are 
conveniently analyzed by digestion with an enzyme that 
recognizes a site that includes nucleotide 63 6 of the coding 
region. 

Other methods detect the 681 polymorphism by a 
different approach involving selective amplification of the 
wildtype or mutant allele. For example, for selective 
amplification of the wildtype allele, a suitable forward 
primer has about 10-50 contiguous nucleotides from the 
wildtype 2C19 sequence shown in Fig. 16 including the 
nucleotide at position 681 of the coding region. The forward 
primer primes amplification from the complement of the 
wildtype 2C19 sequence without priming amplification from the 
complement of the mutant 2C19 sequence shown in Fig. 16. 
Preferably, the 3 * nucleotide of the forward primer is the 
nucleotide at position 681. Analogously, the 681 mutant 
allele can be amplified using a forward primer having 
about 10-50 contiguous nucleotides from the mutant 2C19 
sequence shown in Fig, 16 including the nucleotide at position 
681 of the coding sequence. The forward primer primes 
amplification from the complement of the mutant 2C19 sequence 
without priming amplification from the complement of the 
wildtype 2C19 sequence shown in Fig 16. 

The invention also provides analogous methods for 
detection of the 63 6 polymorphism. 

In other methods, the segment of 2C19 DNA subject to 
analysis is a cDNA sequence. cDNA is produced by reverse 
transcribing mRNA in the sample to produce the cDNA sequence. 
In some methods for detecting the 681 polymorphism, the 
forward primer comprises about 10-50 contiguous nucleotides 
upstream of nucleotide 64 3 of the coding region of the 
wildtype 2C19 cDNA sequence shown in Fig. 12 and hybridizes to 
the complement of the 2C19 sequence upstream from nucleotide 
643 of the coding region, and the reverse primer comprises 
about 10-50 contiguous nucleotides from the complement of the 
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wildtype 2C19 gDNA sequence shown in Fig. 12 and hybridizes to 
the 2C19 sequence downstream from nucleotide 682 of the coding 
region. In other methods, the forward primer hybridizes to 
the complement of the wildtype 2C19 cDNA sequence shown In 
5 Fig. 12 between nucleotides 64 3 and 682 without hybridizing to 
the complement of the mutant 2C19 cDNA sequence shown in 
Fig. 12. In other methods, the reverse primer hybridizes to 
the wildtype 2C19 cDNA sequence shown in Fig. 12 between 
nucleotides 64 3 and 682 without hybridizing to the mutant 2C19 

10 cDNA sequence shown in Fig. 12. 

The invention provides analogous methods for 
diagnosing the 63 6 polymorphism from cDNA. In some methods, 
the forward primer comprises about 10-50 contiguous 
nucleotides upstream of nucleotide 636 of the coding region of 

15 the wildtype 2C19 cDNA sequence shown in Fig. 12, and the 

reverse primer comprises about 10-50 contiguous nucleotides 
from the complement of the wildtype 2C19 cDNA sequence shown 
in Fig. 12 downstream from nucleotide 63 6 of the coding 
region. 

2 0 The invention also provides methods capable of 

detecting any polymorphism from cDNA. ^In these methods, the 
full-length 2C19 cDNA sequence is usually amplified. Analysis 
is often performed by sequencing a segment of the 2C19 cDNA 
amplification product. 
25 The invention provides further methods for 

diagnosing polymorphisms in genomic DNA. In these methods, 
genomic DNA is digested with a restriction enzyme that 
recognizes a site that includes nucleotide 63 6 or 681 of the 
coding region. The digestion products are then detected by 

3 0 Southern blotting with a labelled segment of the 2C19 DNA 

sequence as a probe. 

In another aspect of the invention, diagnostic kits 
are provided. Some diagnostic kits comprise forward and 
reverse primers. The forward primer is sufficiently 
3 5 complementary with a f irst subsequence of the antisense strand 
of a double-stranded 2C19 genomic DNA sequence to hybridize 
therewith, and the reverse primer sufficiently complementary 
with a second subsequence of the sense strand of the 2C19 
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genomic sequence to hybridize therewith. For example, in some 
methods for diagnosis of the 681 polymorphism , the first 
subsequence is upstream of nucleotide 681 of the coding 
region, and second subsequence is downstream of nucleotide 681 
5 of the coding region* Similarly, in some methods for 

diagnosis of the 63 6 polymorphism, the first subsequence is 
upstream of nucleotide 63 6 of the coding region, and the 
second subsequence is downstream of nucleotide 63 6 of the 
coding region. 

10 

BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 shows Western blots of human liver 
microsomal proteins. Microsomal proteins were separated by 
SDS-polyacrylamide gel electrophoresis. Blot A was performed 

15 using polyclonal antibody to 2C9 and blot B with anti-2C8 

(HLx) . Each lane represents 20 fig of microsomal protein from 
an individual liver. The 2C8 antibody also recognized 
purified rat P450 2C13 (g) . cDNA libraries were constructed 
from livers 860624 (low HLx) and S33 (high HLx) * 

20 Figure 2 contains nucleotide sequences of human P4 50 

2C cDNAs. 2c (SEQ. ID. No. 14) is indicated in the top line 
and represents the consensus sequence where information from 
more than one sequence is available. Sequences were 
determined by the dideoxy chain termination method. The 

25 differences observed for clones 25 (SEQ. ID. No. 4) and 65 

(SEQ. ID. No. 10) are underlined. The termination codons are 
starred. The heme binding region and polyadenylation signals 
are underlined. The one-base difference between 29c (SEQ. ID. 
No. 6) and 6b (SEQ. ID. No. 12) are also underlined. The 

3 0 termination codon is starred. The new allelic variant 

proteins of 2C18, referred to as 29c (SEQ. ID. No. 5) and 6b 
(SEQ. ID. No. 11), and the new protein of 2C19, referred to as 
11a (SEQ. ID. No. 1), are compared with the protein of 2C8, 
referred to as 2C8 (SEQ. ID. No. 7) , and the allelic variant 

35 proteins of 2C9, referred to as 65 (SEQ. ID. No. 9) and 25 
(SEQ. ID. No. 3) . 

Figure 3 depicts a comparison of amino acid 
sequences of cytochrome P4 50 2C8 allelic variants. 
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Figure 4 depicts a Western blot of recombinant: 
transformed COS-1 cells* Each lane represents microsomal 
protein (50 jig) from an independent transformation with the 
indicated P450 2C cDNA, mock-transf ected cells (CON) , 20 of 
5 human liver microsomal protein (liver S5) , or 2 pmol of pure 
P450g (2C13). 

Figure 5 shows a Northern blot of human mRNAs. Each 
lane represents 10 iiq of mRNA, and the blot was probed with 
end-labeled T300R, an oligoprobe specified for 2C8 (SEQ. ID. 
10 No. 8) (top) , stripped, and reprobed with 32 P-actin cDNA 
(bottom) . 

Figure 6: Western blots of yeast microsomes 
expressing recombinant P450 2C cDNAs. C0N=control (yeast 
microsomes lacking recombinant proteins) , 
15 Figure 7: Linearity of S-mephenytoin 4 ' -hydroxylase 

activity and amount of recombinant cytochrome P450 2C19. 

Figure 8: S-mephenytoin 4 1 -hydroxylase activity as 
a function of the molar ratio of cytochrome b 5 to recombinant 
cytochrome P450. 

2 0 Figure 9: HPLC radiochromatograms of metabolites 

formed after incubation of labelled mephenytoin with P450 2C 
enzymes, human liver microsomes and yeast control. 

Figure 10: Comparison of liver content of 
cytochrome P450 2C enzymes with S-mephenytoin 4 f -hydroxylase 

25 activity. The upper part of tbe figure shows Western blots of 
liver samples from 16 individuals. The lower part of the 
figure shows the S-mephenytoin 4 * -hydroxy lat ion activity and 
ratios of S/R mephenytoin 4 1 -hydroxylase activity in each 
sample - 

30 Figure 11: Correlation between hepatic 2C19 content 

and S-mephenytoin hydroxylase activity based on the data shown 
in Figure 10. 

Figure 12: Sequence alignment of PCR products from 
normal and aberrantly spliced CYP2C19 cDNAS (SEQ. ID. Nos. 45 
35 and 47) , with the corresponding amino acid translations (SEQ. 
ID. Nos. 46 and 4 8) indicated above and below the nucleotide 
sequence. The new termination codon TAA in the aberrant cDNA 
is indicated by the word END and the asterisk. The PCR 
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primers are indicated by the horizontal arrows in the 
sequence. The aberrant CYP2C19 cDNA is missing 4 0 base pairs 
of the cDNA in poor inetabolizers as indicated by the dotted 
line. 

5 Figure 13: A. Diagram of strategy to amplify 

CYP2C19 cDNA transcripts from human liver samples. The 
sequence for the PCR primers is indicated in Fig. 12. This 
strategy yielded a 284 bp band for the normal cDNA, a 244 bp 
band for the aberrant cDNA and both bands with cDNA from 
10 heterozygous individuals. The hatched area indicates the 4 0 
bp deleted in exon 5 of the aberrant cDNA. B. Relation 
between genotype as assessed by reverse transcription PCR (RT- 
PCR) of human liver mRNA, CYP2C19 protein estimated by 
immunob lot ting, £-mephenytoin hydroxylation activity, and the 
15 ratio of metabolism of the R/S enantiomers. In vitro 

phenotype was based on high (E) , intermediate (I) or low (P) 
S-mephenytoin 4 1 -hydroxylase activity. 

Figure 14 : A. Diagram showing strategy used to 
genotype genomic DNA from human blood. B. Diagram of family 
2 0 of propositus 61 (arrow) showing the pedigree and the gel of 
Smal-digested PCR products. C. Analysis of genomic DNA from 
selected Caucasians subjects from United States or from 
Switzerland. The phenotype (EM, IM or PM) is indicated in the 
brackets above the gel. D. Analysis of genomic DNA from 
25 selected Oriental subjects. 

Figure 15: A. Partial sequence of the intron 
4/exon 5 junction of CYP2C19 in extensive and poor 
metabolizers (SEQ. ID. Nos. 49 and 50). Intron sequences are 
shown in lower case and exon sequences in capitals. The 
30 nucleotides deleted in the aberrantly spliced cDNA are 

indicated in bold. The polymorphic Smal site is underlined in 
2C19 (wt) . The highly conserved AG residues at the intron/exon 
junction are shown in black boxes. The consensus sequence 
(11YNCAGG) (Y=pyrimidine, R=purine, N=any base) for the 3 
35 splice site is indicated underneath the normal. and cryptic 

splice junctions. The branch point consensus sequence (CURAY) 
is placed underneath two putative branch points. B. 
Sequencing of PCR products of genomic DNA from three 
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individuals who were homozygous normal, heterozygous, and 
homozygous defective (based on their Smal restriction 
digests) . The polymorphic Smal restriction site is indicated 
by the bracket in the homozygous wt sequence. The G-»A base 
5 pair change corresponding to position 681 of the cDNA is also 
indicated. C. Schematic representation of splicing in 
CYP2C19 wt and in CYP2C19 m . The black box indicates the 4 0 bp 
that are deleted in exon 5 of poor metabolizers . 

Figure 16: Additional 2C19 genomic sequence 

10 flanking the 681 polymorphism. The wildtype (SEQ. ID. No. 51) 
and mutant (SEQ. ID. No. 61) sequences are identical except 
for the G/A transposition at nucleotide 681. Regions of 
sequence ambiguity are indicated in lower case (n=any 
nucleotide, k=G/T ambiguity, r=A/G ambiguity, m=A/C 

15 ambiguity) . 

Figure 17; Genomic DNA sequence flanking the 63 6 
polymorphism (also referred to as m2) - Wildtype and mutant 
sequences are designated SEQ. ID. Nos. 52 and 54 respectively. 
Intron sequences are indicated in lower case and exons in 

20 capital. Translated amino acids (SEQ. ID. No. 53) are 
indicated above the nucleotide sequence. The numbers 
underneath the sequences indicate the first (482) and last 
(642) nucleotides in exon 4. The two mutations found in exon 
4 are indicated in bold. The aberrant stop codon is indicated 

25 by the word "End." Exemplary primers for PCR amplification 
are underlined. 

Figure 18: Diagnosis of 636 mutation in 2C19 . The 
position of the PCR primers is indicated by arrows at 79-55 
base pairs in intron 3 and 70-89 bp in intron 4. The size of 

3 0 the PCR products expected in the wild type gene (wt) and the 
size of the product in the 63 6 mutant allele are shown in the 
bottom lines . 

Figure 19: Simultaneous detection of the 636 and 
681 mutations. 

35 DEFINITIONS 

Abbreviations for the twenty naturally occurring 
amino acids follow conventional usage (Immunology - A 
Synthesis (E.S. Golub & D.R. Gren, eds. , Sinauer Associates, 
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Sunderland, MA, 2nd ed. , 1991) (hereby incorporated by 
reference for all purposes). Stereoisomers (e.g., D-amino 
acids) of the twenty conventional amino acids, unnatural amino 
acids such as a, a-disubstituted amino acids, N-alkyl amino 
5 acids, lactic acid, and other unconventional amino acids may 
also be suitable components for polypeptides of the present 
invention. Examples of unconventional amino acids include: 4- 
hydroxyproline, y-carboxyglutamate, €-N, N, N-trimethyl lysine, 
e-N-acetyllysine , O-phosphoserine , N-acetylserine , N- 

10 f ormylmethionine, 3-methylhistidine , 5-hydroxylysine , 6J-N- 

methylarginine, and other similar amino acids and imino acids 
(e.g., 4-hydroxyproline) . In the polypeptide notation used 
herein, the left-hand direction is the amino terminal 
direction and the right-hand direction is the carboxy-terminal 

15 direction, in accordance with standard usage and convention. 
Similarly, unless specified otherwise, the lefthand end of 
single-stranded polynucleotide sequences is the 5 1 end; the 
lefthand direction of double— stranded polynucleotide sequences 
is referred to as the 5 1 direction. The direction of 5' to 3 1 

20 addition of nascent RNA transcripts is referred to as the 

transcription direction; sequence regions on the DNA strand 
that are 5 1 to the 5 1 end of the RNA transcript are referred 
to as "upstream sequences"; sequence regions on the DNA strand 
that are 3 ' to the 3 1 end of the RNA transcript are referred 

25 to as "downstream sequences". 

The phrase "polynucleotide sequence" refers to a 
single or double-stranded polymer of deoxyribonucleotide or 
ribonucleotide bases read from the 5' to the 3' end. It 
includes self -replicating plasmids, infectious polymers of DNA 

30 or RNA and non-functional DNA or RNA. 

The following terms are used to describe the 
sequence relationships between two or more polynucleotides: 
"reference sequence", "comparison window", "sequence 

identity", "percentage of sequence identity", and "substantial * 
35 identity". A "reference sequence" is a defined sequence used 
as a basis for a sequence comparison; a reference sequence may 
be a subset of a larger sequence, for example, as a segment of 
a full-length cDNA or gene sequence given in a sequence 
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listing, such as a polynucleotide sequence shown in SEQ. ID. 
NO. 2 or may comprise a complete cDNA or gene sequence. 
Generally, a reference sequence is at least 20 nucleotides in 
length, frequently at least 25 nucleotides in length, and 
5 often at least 50 nucleotides in length. Since two 

polynucleotides may each (1) comprise a sequence (i.e., a 
portion of the complete polynucleotide sequence) that is 
similar between the two polynucleotides, and (2) may further 
comprise a sequence that is divergent between the two 

10 polynucleotides, sequence comparisons between two (or more) 

polynucleotides are typically performed by comparing sequences 
of the two polynucleotides over a "comparison window" to 
identify and compare local regions of sequence similarity. A 
"comparison window", as used herein, refers to a conceptual 

15 segment of at least 20 contiguous nucleotide positions wherein 
a polynucleotide sequence may be compared to a reference 
sequence of at least 20 contiguous nucleotides and wherein the 
portion of the polynucleotide sequence in the comparison 
window may comprise additions or deletions (i.e., gaps) of 20 

20 percent or less as compared to the reference sequence (which 
does not comprise additions or deletions) for optimal 
alignment of the two sequences. Optimal alignment of 
sequences for aligning a comparison window may be conducted by 
the local homology algorithm of Smith & Waterman, Appl. Math. 

25 2:482 (1981) , by the homology alignment algorithm of 

Needleman & Wunsch, J". Mol . Biol. 48:443 (1970), by the search 
for similarity method of Pearson & Lipman, Proc. Natl. Acad. 
Sci. (USA) 85:2444 (1988), by computerized implementations of 
these algorithms (FASTDB (Intelligenetics) /BLAST (National 

30 Center for Biomedical Information) or GAP, BESTFIT, FASTA, and 
TFASTA (Wisconsin Genetics Software Package Release 7.0, 
Genetics Computer Group, 575 Science Dr., Madison, WI) ) , or by 
inspection, and the best alignment (i.e., resulting in the 
* highest percentage of sequence similarity over the comparison 

35 window) generated by the various methods is selected. The 
term "sequence identity" means that two polynucleotide 
sequences are identical (i.e., on a nucleotide-by-nucleotide 
basis) over the window of comparison. The term "percentage of 
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sequence identity" (also sometimes referred to as "percentage 
homology") is calculated by comparing two optimally aligned 
sequences over the window of comparison, determining the 
number of positions at which the identical nucleic acid base 
5 (e.g., A, T, C, G, U, or I) occurs in both sequences to yield 
the number of matched positions, dividing the number of 
matched positions by the total number of positions in the 
window of comparison (i.e., the window size), and multiplying 
the result by 100 to yield the percentage of sequence 

10 identity. The terms "substantial identity" as used herein 

denotes a characteristic of a polynucleotide sequence, wherein 
the polynucleotide comprises a sequence that has at least 85 
percent sequence identity, preferably at least 96 percent 
sequence identity, more usually at least 97, 98 or 99 percent 

15 sequence identity as compared to a reference sequence over a 
comparison window of at least 2 0 nucleotide positions, 
frequently over a window of at least 25-50 nucleotides, 
wherein the percentage of sequence identity is calculated by 
comparing the reference sequence to the polynucleotide 

2 0 sequence which may include deletions or additions which total 
2 0 percent or less of the reference sequence over the window 
of comparison. The reference sequence may be a subset of a 
larger sequence, for example, as a segment of the full-length 
sequence of SEQ, ID ♦ Nos. 2, 6 or 12. 

25 As applied to polypeptides, the term "substantial 

identity" (or "substantial homology") means that two peptide 
sequences, when optimally aligned, such as by the programs 
BLAZE (Intelligenetics) GAP or BESTFIT using default gap 
weights, share at least 85% sequence identity preferably at 

30 least 96 percent sequence identity, more preferably at least 
97, 98 or 99 percent sequence identity or more (e.g., 99,5 
percent sequence identity) . Preferably, residue positions 
which are not identical differ by conservative amino acid 
substitutions. Conservative amino acid substitutions refer to 

35 the interchangeability of residues having similar side chains. 
For example, a group of amino acids having aliphatic side 
chains is glycine, alanine, valine, leucine, and isoleucine; a 
group of amino acids having aliphatic-hydroxyl side chains is 
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serine and threonine; a group of amino acids having amide- 
containing side chains is asparagine and glutamine; a group of 
amino acids having aromatic side chains is phenylalanine, 
tyrosine, and tryptophan; a group of amino acids having basic 
5 side chains is lysine , arginine, and histidine; and a group of 
amino acids having sulfur-containing side chains is cysteine 
and methionine. Preferred conservative amino acids 
substitution groups are: valine-leucine-isoleucine, 
phenylalanine-tyrosine, lysine-arginine, alanine-valine, and 

10 asparagine-glutamine. 

The term "substantially pure" means an object 
species is the predominant species present (i.e., on a molar 
basis it is more abundant than any other individual species in 
the composition) , and preferably a substantially purified 

15 fraction is a composition wherein the object species comprises 
at least about 50 percent (on a molar basis) of all 
macromolecular species present. Generally, a substantially 
pure composition will comprise more than about 8 0 to 90 
percent of all macromolecular species present in the 

20 composition. Most preferably, the object species is purified 
to essential homogeneity (contaminant species cannot be 
detected in the composition by conventional detection methods) 
wherein the composition consists essentially of a single 
macromolecular species. 

25 The term "naturally-occurring" as used herein as 

applied to an object refers to the fact that an object can be 
found in nature. For example, a polypeptide or polynucleotide 
sequence that is present in an organism (including viruses) 
that can be isolated from a source in nature and which has not 

30 been intentionally modified by man in the laboratory is 
naturally-occurring . 

The term "epitope" includes any protein determinant 
capable of specific binding to an immunoglobulin or T-cell 
receptor. Epitopic determinants usually consist of chemically 

3 5 active surface groupings of molecules such as amino acids or 
sugar side chains and usually have specific three dimensional 
structural characteristics, as well as specific charge 
characteristics . 
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Specific binding exists when the dissociation 
constant for a dimeric complex is s 1 fiM, preferably <; 100 nM 
and most preferably s 1 nM* 

The term "allelic variants" refers to gene sequences 
5 mapping to the same chromosomal location in different 

individual in a species but showing a small degree of sequence 
divergence from each other. Typically, allelic variants 
encode polypeptides exhibiting at least 96% or 97% amino acid 
sequence identity with each other. 

10 The term "nonallelic variants" refers to gene 

sequences that show similar structural and/or functional 
properties but map at different chromosomal locations in an 
individual. In the 2C family, nonallelic variants typically 
exhibit 7 0-96% amino acid sequence identity with each other. 

15 The term "cognate variants" refers to gene sequences 

that are evolutionarily and functionally related between 
humans and other species such as primates, porcines, bovines 
and rodents such as mice and rats. Thus, the cognate primate 
gene to a human 2C19 gene is the primate gene which encodes an 

20 expressed protein which has the greatest degree of sequence 

identity to the 2C19 protein and which exhibits an expression 
pattern similar to that of the 2C19 protein. 

Stringent conditions are sequence dependent and will 
be different in different circumstances. Generally, stringent 

25 conditions are selected to be about 5° C lower than the 

thermal melting point (Tm) for the specific sequence at a 
defined ionic strength and pH. The Tm is the temperature 
(under defined ionic strength and pH) at which 50% of the 
target sequence hybridizes to a perfectly matched probe. 

3 0 Typically, stringent conditions will be those in which the 
salt concentration is at least about 0.02 molar at pH 7 and 
the temperature is at least about 60 °C. As other factors may 
significantly affect the stringency of hybridization, 
including, among others, base composition and size of the 

35 complementary strands, the presence of organic solvents and 

the extent of base mismatching, the combination of parameters 
is more important than the absolute measure of any one. 
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A polymorphism is a condition in which two or more 
different nucleotide sequences coexist in the same 
interbreeding population in a DNA sequence. 

The term "oligonucleotide" refers to a molecule 
5 comprised of two or more deoxyribonucleotides or 

ribonucleotides , such as primers, probes, nucleic acid 
fragments to be detected, and nucleic acid controls. The 
exact size of an oligonucleotide depends on many factors and 
the ultimate function or use of the oligonucleotide. 

10 Oligonucleotides can be prepared by any suitable method, 

including, for example, cloning and restriction of appropriate 
sequences and direct chemical synthesis by a method such as 
the phosphotriester method of Narang et al-, Meth. Enzymol. 
68:90-99 (1979); the phosphodiester method of Brown et al., 

15 Meth, Enzymol. 68:109-151 (1979); the diethylphosphoramidite 
method of Beaucage et al. , Tetrahedron Lett. 22:1859-1862 
(1981); and the solid support method of U.S. Patent No. 
4,458,066. 

A primer is an oligonucleotide, whether natural or 

2 0 synthetic, capable of acting as a point of initiation of DNA 
synthesis under conditions in which synthesis of a primer 
extension product complementary to a nucleic acid strand is 
induced, i.e., in the presence of four different nucleoside 
triphosphates and an agent for polymerization (i.e., DNA 

25 polymerase or reverse transcriptase) in an appropriate buffer 
and at a suitable temperature. 

"Probe" refers to an oligonucleotide which binds 
through complementary base pairing to a subsequence of a 
target nucleic acid. Probes will typically hybridize to 

30 target sequences lacking complete complementarity with the 

probe sequence on reducing the stringency of the hybridization 
conditions. The probes are preferably directly labelled as 
with isotopes or indirectly labelled such as with biotin to 
which a streptavidin complex may later bind. By assaying for 

35 the presence or absence of the probe, one can detect the 
presence or absence of the target. 

"Subsequence" refers to a sequence of nucleic acids 
that comprise a part of a longer sequence of nucleic acids. 
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The term "target region" refers to a region of a 
nucleic acid to be analyzed such as a polymorphic region. 

Hybridization refers to binding between an 
oligonucleotide and a target sequence via complementary base 
5 pairing to achieve the desired priming by PCR polymerases or 
detection of hybridization signal, and sometimes embraces 
minor mismatches that can be accommodated by reducing the 
stringency of the hybridization conditions, 

10 DESCRIPTION OF THE SPECIFIC EMBODIMENTS 

The invention provides novel cytochrome P4 50 2C 
polypeptides, DNA fragments encoding these polypeptides and 
cell lines expressing the polypeptides. The invention also 
provides methods of using the novel polypeptides for, inter 

15 alia, identifying drugs metabolized by S-mephenytoin 4'- 
hydroxylase activity . 

I. Polypeptides 

In one embodiment, the invention provides novel 

20 cytochrome P450 2C polypeptides, designated 2C18 and 2C19. 

The 2C18 and 2C19 proteins are nonallelic with each other and 
with known 2C polypeptides. An exemplary 2C19 polypeptide has 
the amino acid sequence designated SEQ. ID. No. 1. The 
invention also provides allelic variants of the exemplified 

25 2C19 polypeptide, and natural and induced mutants of such 

variants. The invention provides human 2C19 polypeptides and 
cognate variants thereof. Typically, 2C19 variants exhibit at 
substantial sequence identity (e.g., at least 96% or 97% amino 
acid sequence identity) with the exemplified- 2C19 polypeptide 

30 and cross-react with antibodies specific to this polypeptide. 
2C19 variants are usually encoded by nucleic acids that show 
substantial sequence identity {e.g., at least 96% or 97% 
sequence identity) with the nucleic acid encoding the 
exemplified 2C19 variant (SEQ- ID. No. 2) - 

35 Some 2C19 polypeptides, including the exemplified 

polypeptide, exhibit high levels of stereospecif ic S- 
mephenytoin 4 '-hydroxylase activity. See Table IV. Indeed, 
it is highly probable that 2C19 represents the principal human 
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determinant of this activity. Typically such 2C19 
polypeptides exhibit a stereospecific S-mephenytoin 4'- 
hydroxylase activity of about 0.5-100, 1-10 or about 4-6 nmol 
S-mephenytoin per nmol 2C19 polypeptide per minute. 
5 Frequently, the activity of 2C19 polypeptides is higher than 
of native human liver microsomes. The activity of such 
polypeptides for the R-enantiomer of mephenytoin is typically 
at least 10, 50 or 100-fold lower. 

Other 2C19 polypeptides may lack substantial 

10 stereospecific S-mephenytoin 4 ' -hydroxylase activity. Such 
polypeptides represent allelic variants of the exemplified 
2C19 polypeptide. These polypeptides sometimes exhibit low 
levels of mephenytoin 4 ' -hydroxylase activity (i.e., less than 
about 0.5 or 0.2 nmol mephenytoin per nmol 2C19 polypeptide 

15 per minute) . This activity may, or may not be, 

stereospecific- Although the presence of a 2C19 polypeptide 
with low enzymic activity could account for the phenotype of a 
few individuals defective in S-mephenytoin 4 ' -hydroxylase 
activity, the phenotype in most such individuals results from 

20 a complete or substantial absence of 2C19 polypeptide. Sea, 
e.g. , Figure 10. 

The invention also provides 2C18 polypeptides. The 
amino acid sequences of two allelic variants of 2C18 are 
designated SEQ. ID. Nos. 5 and 11. Also provided are allelic 

25 variants of the exemplified 2C18 polypeptides, conjugated 

variants thereof, and natural and induced mutants of any of 
these. Typically, 2C18 variants exhibit substantial sequence 
identity (e.g., at least 96% or 97% amino acid sequence 
identity) with the exemplified 2C18 polypeptides and cross- 

30 react with antibodies specific to these polypeptides. 2C18 
variants are usually encoded by nucleic acids that show 
substantial sequence identity (e.g., at least 96% or 97% 
sequence identity) with the nucleic acid encoding the 
exemplified 2C18 variants (SEQ. ID. Nos. 6 and 12) . 

35 2C18 polypeptides typically show low levels of 

mephenytoin 4 ' -hydroxylase activity (0.01*0.2 nmol mephenytoin 
per nmol 2C18 polypeptide per min. For some 2C18 
polypeptides, the activity shows a small degree of 
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other species, allelic variants, and natural and induced 
mutants of any of these. Specifically, all nucleic acid 
fragments encoding all 2C18 and 2C19 polypeptides disclosed in 
this application are provided. Genomic libraries of many 
5 species are commercially available (e.g., Clontech, Palo Alto, 
CA) , or can be isolated de novo by conventional procedures, 
cDNA libraries are best prepared from liver extracts. 

The probes used for isolating clones typically 
comprise a sequence of about at least 15, 20 or 25 contiguous 

10 nucleotides (or their complement) of an exemplified DNA 

sequence (i.e., SEQ. ID. Nos. 2, 6 or 12). Preferably probes 
are selected from regions of the exemplified sequences that 
show a high degree of variation between different 2C 
nonallelic variants. Hypervariable regions are the nucleic 

15 acids encoding amino acids 181-210, 220-248, 283-296 and 461- 
4 79. Probes from these regions are likely to hybridize to 
allelic variants but not to nonallelic variants of the 
exemplified sequences under stringent conditions. Allelic 
variants can be isolated by hybridization screening of plaque 

20 lifts (Benton & Davis, Science 196:180 (1978). Alternatively, 
cDNAs can be prepared from liver mRNA by polymerase chain 
reaction (PCR) methods. 5 1 - and 3'- specific primers for 2C19 
are designed based on the nucleotide sequence designated SEQ. 
ID. No. 2. See generally PCR Technology: Principles and 

25 Applications for DNA Amplification (ed. H.A. Erlich, Freeman 
Press, NY, NY, 1992); PCR Protocols: A Guide to Methods and 
Applications (eds. Innis, et al., Academic Press, San Diego, 
CA, 1990); Mattila et al., Nucleic Acids Res. 19:4967 (1991); 
Eckert et al., PCR Methods and Applications 1:17 (1991); PCR 

30 (eds. McPherson et al., IRL Press, Oxford); and U.S. Patent 

4,683,202 (each of which is incorporated by reference for all 
purposes) . 

Nucleotide substitutions, deletions, and additions 
can be incorporated into the polynucleotides of the invention. 
35 Nucleotide sequence variation may result from degeneracy of 

the genetic code, from sequence polymorphisms of 2C18 and 2C19 
alleles, minor sequencing errors, or may be introduced by 
random mutagenesis of the encoding nucleic acids using 
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irradiation or exposure to EMS, or by changes engineered by 
site-specific mutagenesis or other techniques. See Sambrook 
et al., Molecular Cloning: A Laboratory Manual (C.S.H.P. 
Press, NY 2d ed. # 1989) (incorporated by reference for all 
5 purposes) . 

III. Cell Lines 

In another embodiment of the invention, cell lines 
capable of expressing the nucleic acid segments described 

10 above are provided. Stable cell line's are preferred to cell 
lines conferring transient expression. Stable cell lines can 
be passaged at least fifty times without reduction in the 
level of 2C polypeptides expressed by the cell lines. 
Preferably, cell lines are capable of being cultured so as to 

15 express 2C polypeptides at high levels, usually at least 0.2, 
l, 10, 20, 50, 100, 200 or 500 pmol of 2C polypeptide per mg 
of microsomal protein. For example, the 2C19 expression level 
of many cell lines of the invention is typically about 0.2- 
10,000, 1-200, 7-100, 10-50 or 10-20 pmol 2C19 polypeptide per 

20 mg microsomal protein. An expression level of 10 pmol 2C19 
per mg microsomal protein means that 2C19 represents about 
0.06% of total cellular protein. For E. coll and insect cell 
lines, the recombinant P450 protein can comprise 5-10% of 
total cellular protein. Often, the stable cell lines of the 

25 invention express more than one P450 polypeptide. These cell 
lines express 2C18 and/ or 2C19 together with other members of 
the 2C family, or other P450 cytochromes such as 1A1, 1A2, 
2A6, 3A3, 3A4, 2B6, 2B7 , 2C9, 2D6 , and/or 2E1. 

E. coli is one prokaryotic host useful for cloning 

3 0 the polynucleotides of the present invention. Other microbial 
hosts suitable for use include bacilli , such as Bacillus 
subtil us, and other enterobacteriaceae , such as Salmonella, 
Serratia, and various Pseudomonas species* Expression vectors 
typically contain expression control sequences compatible with 

35 the host cell, e.g., an origin of replication, any of a 

variety of well-known promoters, such as the lactose promoter 
system, a tryptophan (trp) promoter system, a beta-lactamase 
promoter system, or a promoter system from phage lambda. 
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Vectors often also contain an operator sequence and /or a 
ribosome binding site. The control sequences are operably 
linked to a P4 50 DNA segment so as to ensure its 
expression, and control the expression thereof. 
5 Other microbes, such as fungi, particularly, yeast, 

are particularly useful for expression. Saccharomyces is a 
preferred host, with suitable vectors having expression 
control sequences, such as promoters, including 3- 
phosphoglycerate kinase or other glycolytic enzymes, and an 

10 origin of replication, termination sequences and the like as 

desired. For example, the plasmid pAAHS can be used. The 5'- 
noncoding sequence of the P450 2C cDNAs can be eliminated and 
six adenosines added by polymerase chain reaction (PCR) 
amplification to optimize expression in yeast cells. The 5*— 

15 and 3' -primers recommended for amplification of 2C18 are 5'- 

GCAAGCTTAAAAAATGGATCCAGCTGTGGCTCT-3 1 ( SEQ . ID . No. 15 ) and 5 • - 
GCAAGCTTGCCAAACTATCTGCCCTTCT-3 1 (SEQ. ID. No. 16). This 
includes addition of a Hind III restriction site at both ends 
to allow insertion into the pAAHS vector and six 6 adenosines 

20 at the 5»-end to optimize translation. The final 20 bases of 
each sequence is specific for 20 bases at the 5'-end of 2C18 
starting with the ATG for methionine and 20 bases of the 3 f - 
noncoding region. The primers for 2C19 can be constructed 
similarly. The yeast strain used, Saccharomyces cerevisiae 

25 334, can be propagated non-selectively in YPD medium (1% yeast 
extract, 2% peptone, 2% dextrose (Hovland et al- (1989) Gene 
83, 57-64) and Leu+ transf ormants selected on synthetic 
minimal medium containing 0.67% nitrogen base (without amino 
acids), 0.5% ammonium sulfate, 2% dextrose and 20 /xg/ml L 

30 histidine (SD+His) . Plates are made by the addition of 2% 

agar. Yeast can be transformed by the lithium acetate method 
of Ito et al. (1983) J. Bacterlol . 153, 163 and selected on 
SD+His for selection of transf ormants. Cells are then grown 
to mid- logarithmic phase (Oeda et al. , DNA 4:203-210 (1985)) 

35 and microsomes containing recombinant protein can be prepared. 

Insect cells (e.g., SF9) with appropriate vectors, 
usually derived from baculovirus, are also suitable for 
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expressing 2C polypeptides. See Luckow, et al. Bio/Technology 
6:47-55 (1988) (incorporated by reference for all purposes). 

Mammalian tissue cell culture can also be used to 
express and produce the polypeptides of the present invention 
(see Winnacker, From Genes to Clones (VCH Publishers, N.Y., 
N.Y., 1987). Suitable host cell lines include CHO cell lines 
(e.g., V79) (Dogram et al. (1990) Mol. Pharmacol. 37, 607- 
613), various COS cell lines , HeLa cells, myeloma cell lines 
and Jurkat cells f hepatoma cell lines (Hep G2) , and a 
lymphoblastoid cell line AHH-1 TK+/- . Crespi et al. (1991) 
Carcinogenesis 12, 355-359. Expression vectors for these 
cells (e.g., pEBVHistK or pSV2) can include expression control 
sequences, such as an origin of replication, a promoter (e.g., 
a HSV tic promoter or pgk (phosphogly cerate kinase promoter) , 
an enhancer (Queen et al., Immunol. Rev. 89:49 (1986)), and 
necessary processing information sites, such as ribosome 
binding sites, RNA splice sites, polyadenylation sites (e.g., 
an SV4 0 large T Ag poly A addition site) , and transcriptional 
terminator sequences. Preferred expression control sequences 
are promoters derived from immunoglobulin genes, SV40, 
adenovirus, bovine papillomavirus, and the like. Expression 
control sequences are operably linked to a DNA segment 
encoding a P4 50 polypeptide so as to ensure the polypeptide is 
expressed. 

The vectors containing the polynucleotide sequences 
of interest can be transferred into the host cell by well- 
known methods, which vary depending on the type of cellular 
host. For example, calcium chloride transfection is commonly 
utilized for prokaryotic cells, whereas calcium phosphate 
treatment or electr operation may be used for other cellular 
hosts. (See generally Sambrook et al,, Molecular Cloning: A 
Laboratory Manual (Cold Spring Harbor Press, 2nd ed. , 1989) 
(incorporated by reference in its entirety for all purposes) . 

Once expressed, the polypeptides of the invention 
and their fragments can, if desired, be purified according to 
standard procedures of the art, including ammonium sulfate 
precipitation, affinity columns, column chromatography, gel 
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electrophoresis and the like (see generally Scopes, Protein 
Purification (Springer-Verlag, N.Y., 1982). 

XV . Ant i bod i e s 
5 The invention also provides antibodies that 

specifically bind to epitopes on the 2C18 and 2C19 
polypeptides of the invention. Some antibodies specifically 
bind to one member of the 2C family (e.g., 2C19) without 
binding to nonallelic forms. Some antibodies specifically 

10 bind to a single allelic form of a 2C member such as the 2C19 
polypeptide having the amino acid sequence designated SEQ. ID. 
No. 1. Antibodies that specifically bind to a 2C19 
polypeptide without binding to a 2C9 polypeptide are 
particularly useful in view of the relatively high degree of 

15 sequence identity between these nonallelic variants. See 

Table II. The production of non-human monoclonal antibodies, 
e.g., murine, lagomorpha, equine is well known and can be 
accomplished by, for example, immunizing an animal with a 
preparation containing a 2C19 polypeptide or an immunogenic 

2 0 fragment thereof. Human antibodies can be prepared using 

phage-display technology. See, e.g., Dower et al. f WO 
91/17271 and McCafferty et al., WO 92/01047 (each of which is 
incorporated by reference in its entirety for all purposes) . 
Humanized antibodies are prepared as described by Queen et 
25 al. , WO 90/07861. 

V. Methods of Use 

A. Identification of Drugs Unsuitable for 
Administration to Poor Metabolizers of S-MepTienytoin 

3 0 The identification of a 2C19 polypeptide as the 

principal determinant of human S-mephenytoin 4 • -hydroxylase 
activity facilitates methods of screening drugs that are 
metabolized by this enzyme. Such drugs likely lack efficacy 
and/ or show intolerable side effects in individuals having a 
35 defect in S-mephenytoin 4 ' -hydroxylase activity (low 

producers) . The substantial absence of this activity in low 
producers often results in an inability to detoxify such 
drugs, preventing their elimination from the body. 
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Substantial absence of S-mephenytoin 4 1 -hydroxylase activity 
can also prevent metabolic processing Qf certain drugs to 
activated forms. Drugs suspected of being metabolized by S- 
mephenytoin 4 1 -hydroxylase activity include, in addition to 
5 mephenytoin itself, omeprazole, proguanil, diazepam and 
certain barbiturates . 

Drugs are screened for metabolic processing by S- 
mephenytoin 4 1 -hydroxy lase activity in a variety of assays. 
See Example 5. In brief, the drug under test is usually 

10 labelled with a radioisotope or otherwise. The drug is then 

contacted with a 2C19 polypeptide exhibiting S-mephenytoin 4'- 
hydroxylase activity (e.g., the polypeptide designated SEQ. 
ID. No. 1) . The 2C19 polypeptide can be in purified form or 
can be a component of a lysate of one of the cell lines 

15 discussed in Section III. Often, the 2C19 polypeptide is part 
of a microsomal fraction of a cell lysate. The 2C19 
polypeptide can also be a component of an intact cell as many 
drugs are taken up by such cells. Often, the reaction mixture 
is supplemented with one or more of the following reagents: 

20 dilauroylphosphatidylcholine, cytochrome P4 50 reductase, human 
cytochrome b5, and NADPH. (See Example 5, for concentrations 
of these reagents and a suitable buffer) . After an incubation 
period (e.g., 30 min) , the reaction is terminated, and 
centrifuged. The supernatant is analyzed for metabolic 

25 activity, e.g., by a spectrographic or chromatographic method. 
The assay is usually performed in parallel on a control 
reaction mixture without a 2C19 polypeptide. Metabolic 
activity is shown by a comparative analysis of supernatants 
from the test and control reaction mixtures. For example, a 

30 shift in retention time of radiolabelled peaks between test 

and control under HPLC analysis indicates that the drug under 
test is metabolized by S-mephenytoin 4 ■ -hydroxylase activity. 
Often, the test is repeated using an extract from human liver 
in place of the 2C19 polypeptide. The appearance of a 

35 labelled metabolic peak from the reaction using 2C19 

recombinant organisms or 2C19 recombinant cell fractions 
having the same HPLC retention time, and a specific activity 
at least as high, as that observed for human liver microsomes 
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provides strong evidence that S-mephenytoin 4-hydroxylase 
activity plays a major role in processing the drug. The test 
can also be repeated using other 2C members, such as 2C18, as 
controls, in place of 2C19. 
5 Drugs can also be screened for metabolic dependence 

on S-mephenytoin 4 9 -hydroxylase activity in transgenic 
nonhuman animals. Some such animals have genomes comprising a 
2C19 transgene (e.g., SEQ. ID. No. 2) operably linked to 
control sequences so as to render the transgene capable of 

10 being expressed in the animals- Other transgenic animals have 
a genome containing homozygous null mutations of endogenous 
2C19 genes. Mice and other rodents are particular suitable 
for production of transgenic animals. Drugs are administered 
to transgenic animals in comparison with normal control 

15 animals and the effects from administration are monitored. 

Drugs eliciting different responses in the transgenic animals 
than the control animals likely require S-mephenytoin 4 r - 
hydroxylase activity for detoxification and/or activation. 

Drugs identified by the above screening methods as 

20 being metabolized by S-mephenytoin 4 • -hydroxylase activity 

should generally not be administered to individuals known to 
be deficient in this enzyme, or should be administered at 
different dosages. Indeed, in the absence of data on an 
individual patient's S-mephenytoin 4-hydroxylase phenotype, it 

25 is often undesirable to administer such drugs to any member of 
an ethnic group known to be at high risk for S-mephenytoin 4- 
hydroxylase deficiency (e.g* r Orientals and possibly blacks). 
If it is essential to administer drugs identified by the above 
screening procedures to individuals known to be at risk of 

30 enzymic deficiency {e.g. , no alternative drug is available), a 
treating physician is at least apprised of a need for vigilant 
monitoring of the patient's response to the drug. In general, 
the identification of a new drug as a substrate for 2C19 would 
mitigate against further development of the drug. 

35 



BNSDOCID: <WO 9530766A1_.1_> 



WO 95/30766 



PCT/US95/05744 



30 

B. Screening Compounds for Mutagenic, Cytotoxic or 
Carcinogenic Activity 

The invention provides methods of measuring the 
mutagenic, cytotoxic or carcinogenic potential of a compound. 
5 In some methods, mutagenic, cytotoxic or carcinogenic effects 
are assayed directly on a cell line harboring one or more 
recombinant cytochrome P4 50 enzymes. In these methods, a 
compound under test is added to the growth medium of a cell 
line expressing 2C19, and/or 2C18 and/or other cytochrome 

10 P450s. Often, one or more of the reagents discussed in 
Section V(I) , supra, is also added. After a suitable 
incubation, mutagenic, cytotoxic or carcinogenic effects are 
assayed. Mutagenic effects are assayed, e.g., by detection of 
the appearance of drug-resistant mutant cell colonies 

15 (Thompson, Methods Enzymol . , 58:308, 1979). For example, 

mutagenicity can be evaluated at the hgprt locus (Penman et 
al., (1987) -Environ. Mol* Mutagenesis 10, 35-60). 
Cytotoxicity can be assayed from viability of the cell line 
harboring the P450 enzyme(s) . Carcinogenicity can be assessed 

2 0 by determining whether the cell line harboring the P450 

enzymes has acquired anchorage-independent growth or the 
capacity to induce tumors in athymic nude mice. 

In other methods, a suspected compound is assayed in 
a selected test cell line rather than a cell line harboring 
25 P450 enzymes. In these methods, the compound under test is 
contacted with P450 2C19 and/or 2C18 and/or other P450 
enzymes. The P450 enzyme (s) can be provided in purified form, 
or as components of lysates or microsomal fractions of cells 
harboring the recombinant enzyme (s) . The P450 enzyme (s) can 

3 0 also be provided as components of intact cells. Usually, one 

or more of the reagents discussed in Section V(l) , supra, is 
also added. Optionally, the appearance of metabolic products 
from the suspected compound can be monitored by techniques 
such as thin layer chromatography or high performance liquid 
35 chromatography and the like. 

The metabolic products resulting from treatment of 
the suspected compound with P4 50 enzyme (s) are assayed for 
mutagenic, cytotoxic or carcinogenic activity in a test cell 
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line. The test cell line can be present during the metabolic 
activation of the mutagen or can be added after activation has 
occurred. Suitable test cell lines include a mutant strain of 
Salmonella typhlmurlum bacteria having auxotrophic histidine 
5 mutations (Ames et al. , Mut. Res. 31:347-364 (1975). other 
standard test cell lines include Chinese hamster ovary cells 
(Galloway et al., Environ. Mutagen. 7:1 (1985); Gulati et al., 
(Environ. Mol . Mutagenesis 13:133-193 (1989)) for analysis of 
chromosome aberration and sister chromatic exchange induction, 

10 and mouse lymphoma cell (Myhr et al. ,* Prog. Mut. Res. 5:555-. 
568, (1985)). 

The use of defined P4 50 enzymes for activation of 
compounds in the present methods offers significant advantages 
over previous methods in which rat or human S9-supernatant 

15 liver fractions (containing an assortment of P450 enzymes) 

were used. The present methods are more reproducible and also 
provide information on the mechanisms by which mutagenesis, 
cytotoxicity and carcinogenicity are effected. 

2 0 c. Identification of Potential Chemopreventive 

Drugs 

The invention also provides methods for identifying 
drugs having chemopreventive activity. These methods employ 
similar procedures to those discussed in paragraph (2) above 

25 except that the methods are performed using a known mutagenic, 
cytotoxic or carcinogenic agent, together with a suspected 
chemopreventive agent. Mutagenic, cytotoxic or carcinogenic 
effects in the presence of the chemopreventive agents are 
compared with those in control experiments ih which the 

30 chemopreventive agent is omitted. 

D. Screening; for Potential Chemotherapeutic Drugs 
The invention provides analogous methods to those 
described in paragraph (2) , supra, for screening 
35 chemotherapeutic agents. In some methods, chemotherapeutic 
activity is determined directly on a tumorigenic cell line 
expressing 2C19 and/or 2C18 and or other cytochrome P450 
enzymes. In other methods, chemotherapeutic activity is 



BNSDOCID: <WO . 9530766A1 I > 



WO 95/30766 



PCTAJS95/05744 



32 

determined on a tumorigenic test cell line. Chemotherapeutic 
activity is evidenced by reversion of "the transformed 
phenotype of cells resulting in reduced 50bb agar growth or 
reduced tumor formation in nude mice. 

E- Programmed Cell Death 

The invention provides analogous methods to those 
described in paragraph (2) , supra, for identifying agents that 
induce programmed cell death or apoptosis. Apoptosis may have 
an important impact on prevention of malignant transformation. 
Programmed cell death is assayed by DNA fragmentation or cell- 
surface antigen analysis. 

F. Monitoring 2C18 and 2C19 Polypeptides 
The invention provides methods of guantitating the 
amount of the specific protein in mammalian tissues by 
measuring the complex formed between the antibody and proteins 
in the tissue. For example r a biological sample is contacted 
with an antibody under conditions such that the antibody binds 
to specific proteins forming an antibody : protein complex which 
can be quantitatively detected. 

VI. Diagnosing 2C19 and 2C18 Polymorphisms 
Diagnostic Assays for Identifying Individuals Deficient in S- 
Mephenytoin 4 ' -Hydroxylase 

The invention provides a variety of assays for 
identifying individuals deficient in S-mephenytoin 4 1 - 
hydroxylase activity. Such individuals comprise about 3-5% of 
Caucasian populations and about 20% of Orientals and possibly 
blacks. Identification of individuals deficient in S- 
mephenytoin 4 • -hydroxylase activity is important in selecting 
appropriate drugs for treatment of these individuals. 
Usually, drugs that are metabolized by S-mephenytoin 4'- 
hydroxylase should not be administered to these individuals. 
The assays diagnose mutations in cDNA or genomic DNA encoding 
2C19, which as discussed above, is the principal human 
determinant of s-mephenytoin 4 • -hydroxylase activity. The 
cDNA assays are particularly useful for de novo localization 
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of a 2C19 mutation to a particular nucleotide or nucleotides . 
The genomic assays are particularly useful for large-scale 
screening of individuals for the presence of a mutation that 
has previously been localized ♦ 

5 

A. Amplification Technologies 

Many of the diagnostic assays rely on amplification 
of part or all of a DNA segment encoding a 2C19 polypeptide 
(e.g., a 2C19 gene). In a preferred embodiment, target 

10 segments encoding a 2C19 polypeptide are amplified by the 

polymerase chain reaction. The PCR process is described in 
e.g., U.S. Patent Nos. 4,683,195; 4,683,202; and 4,965,188; 
PCR Technology: Principles and Applications for DNA 
Amplification (ed. Erlich, Freeman Press, New York, NY, 1992) ; 

15 PCR Protocols : A Guide to Methods and Applications (eds. Innis 
et al- , Academic Press, San Diego, CA (1990); Mattila et al. 
Nucleic Acids Res. 19:4967 (1991); Eckert & Kunkel PCR Methods 
and Applications 1:17 (1991); PCR (eds. McPherson et al., IRL 
Press, Oxford) (each of which is incorporated by reference in 

20 its entirety for all purposes) . Reagents, apparatus and 

instructions for using the same are commercially available 
(e.g., from PECI) . Other amplification systems inciude, 
ligase chain reaction, QB RNA replicase and RAN-transcription- 
based amplification systems. 

25 To amplify a target nucleic acid sequence in a 

sample by PCR, the sequence must be accessible to the 
components of the amplification system. Accessibility can be 
achieved by isolating the nucleic acids from the sample- A 
variety of techniques for extracting nucleic acids from 

30 biological samples are known in the art. Alternatively, if 
the sample is fairly readily disruptable, the nucleic acid 
need not be purified prior to amplification by the PCR 
technique, i.e., if the sample is comprises cells, 
particularly peripheral blood lymphocytes or monocytes, lysis 

35 and dispersion of the intracellular components may be 

accomplished merely by suspending the cells in hypotonic 
buffer. See Han et al. Biochemistry 26:1617-1625 (1987). 
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For amplification of mRNA sequences, a first step is 
the synthesis of a DNA copy (cDNA) of the region to be 
amplified by reverse transcription. Reverse transcription is 
the polymerization of deoxynucleoside triphosphates to form 
5 primer extension products that are complementary to a 

ribonucleic acid template. The process is effected by reverse 
transcriptase, an enzyme that initiates synthesis at the 3 '- 
end of the primer and proceeds toward the 5 '-end of the 
template until synthesis terminates. Examples of suitable 

10 polymerizing agents that convert the RNA target sequence into 
a complementary, copy-DNA (cDNA) sequence are avian 
myeloblastosis virus reverse transcriptase and Theirmus 
theirmophllous DNA polymerase, a thermostable DNA polymerase 
with reverse transcriptase activity marketed by PECI. Reverse 

15 transcription can be carried out as a separate step, or in a 
homogeneous reverse transcription-polymerase chain reaction 
(RT-PCR) . Polymerizing agents suitable for synthesizing a 
complementary, copy-DNA (cDNA) sequence from the RNA template 
are reverse transcriptase (RT) , such as avian myeloblastosis 

20 virus RT, Moloney murine leukemia virus RT, or Xhermus 
thermophllous (Tth) DNA polymerase, a thermostable DNA 
polymerase with reverse transcriptase activity marketed by 
PECI . 

The first step of each amplification cycle of the 
25 PCR involves the separation of the nucleic acid duplex formed 
by the primer extension. Strand separation is achieved by 
heating the reaction to a sufficiently high temperature for an 
sufficient time to cause the denaturation of the duplex but 
not to cause an irreversible denaturation of .the polymerase 
30 (see U.S. Patent No. 4,965,188). Typical heat denaturation 
involves temperatures ranging from about 80 °C to 105 °C for 
times ranging from seconds to minutes. Typically, any initial 
RNA template is also degraded during the denaturation step 
leaving only DNA template. Other means of strand separation, 
35 including physical, chemical, or enzymatic means, are also 
possible. 

Once the strands are separated, the next step 
involves hybridizing the separated strands with primers that 
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flank the target sequence. The primers are then extended to 
form complementary copies of the target strands. Template- 
dependent extension of primers in PCR is catalyzed by a 
polymerizing agent in the presence of adequate amounts of four 
5 deoxyribonucleotide triphosphates (typically dATP, dGTP, dCTP, 
and dTTP) in a reaction medium comprised of the appropriate 
salts, metal cations, and pH buffering system. Suitable 
polymerizing agents include, for example, E. coli DNA 
polymerase 1 or its Klenow fragment, T 4 DNA polymerase, Tth 

10 polymerase, and Tag polymerase, a heat-stable DNA polymerase 
isolated from Thermus aquaticus commercially available from 
Perkin-Elmer Cetus Instruments (PECI, Norwalk, CT) . See U.S. 
Patent No. 4,889,818. See Gelfand, 1989 in PCR Technology, 
supra. The polymerizing agents initiate synthesis at the 3 

15 end of the primer and proceeds toward the 5 '-end of the 
template until synthesis terminates. 

The primers are designed so that the position at 
which each primer hybridizes along a duplex sequence is such 
that an extension product synthesized from one primer, when 

2 0 separated from the template (complement) , serves as a template 

for the extension of the other primer. The cycle of 
denaturation, hybridization, and extension is repeated as many 
times as necessary to obtain the desired amount of amplified 
nucleic acid. 

25 The primers are selected to be substantially 

complementary to the different strands of each specific 
sequence to be amplified- This means that the primers must be 
sufficiently complementary to hybridize with their respective 
strands. Therefore, the primer sequence need not reflect the 

3 0 exact sequence of the template. For example, a non- 

complementary nucleotide fragment may be attached to the 5 ' 
end of the primer with the remainder of the primer sequence 
being complementary to the strand. Alternatively, 
complementary bases or longer sequences can be interspersed 
3 5 into the primer, provided that the primer sequence has 

sufficient complementarity with the sequence of the strand to 
be amplified to hybridize therewith and thereby form a 
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template for synthesis of the extension product of the other 
primer. 

Paired primers for amplification of a given segment 
of DNA are designated forward and reverse primers. 
5 Conventionally, the orientation of a double-stranded DNA 

molecules is that of the sense (or coding strand) , with the 
5»-terminus of the coding strand being drawn on the left (see, 
e.g., Fig. 15). Under this convention, the forward primer 
hybridizes to a double-stranded DNA molecule at a position 5 1 

10 (or upstream) from the reverse primer. The forward primer 
hybridizes to the complement of the coding strand of the 
double stranded sequence (i.e., the antisense strand) and the 
reverse primer hybridizes to the coding strand. 

The appropriate length of a primer depends on the 

15 intended use of the primer but typically ranges from 10-100, 
15-50, 15-3 0, or more usually, 15 to 25 nucleotides. Shorter 
primers tend to lack specificity for a target nucleic acid 
sequence and generally require cooler temperatures to form 
sufficiently stable hybrid complexes with the template. 

20 Longer primers are expensive to produce and can sometime self- 
hybridize to form hairpin structures. 

The spacing of primers determines the length of 
segment to be amplified. The spacing is not usually critical 
and amplified segments can range in size from about 25 bp to 

25 at least 35 kbp. Segment from 25-2000, 50-1000, 100-500 bp or 
about 4 00 bp are typical. For larger segments, difficulties 
may occasionally be encountered in obtaining efficient and 
accurate amplification. For smaller segments, analysis of 
amplification products may be more difficult i 

3 0 The primer can be labelled, if desired, by 

incorporating a label detectable by spectroscopic, 
photochemical, biochemical, immunochemical, or chemical means. 
For example, useful labels include 32 P, fluorescent dyes, 
electron-dense reagents, enzymes (as commonly used in an 

35 ELISA) , biotin, or haptens and proteins for which antisera or 
monoclonal antibodies are available. A label can also be used 
to "capture" the primer, so as to facilitate the 
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immobilization of either the primer or a primer extension 
product, such as amplified DNA, on a solid support. 

B. Tissue Sample for Analysis 
5 The diagnostic assays are performed on a tissue 

sample containing a nucleic acid encoding a 2C19 polypeptide. 
For assay of genomic DNA, virtually any tissue sample (other 
than pure red blood cells) is suitable. For example, 
convenient tissue samples include whole blood, buccal, skin 

10 and hair. For assay of cDNA, the tissue sample must be 

obtained from an organ in which a 2C19 gene is expressed, such 
as the liver. Liver samples from dead patients are suitable 
for de novo localization of mutations (see Section C, infra) . 
However, for screening of living persons, liver biopsies, 

15 while feasible, are generally undesirable. Thus, for large- 
scale screening of living persons, analysis of genomic DNA is 
preferred. 



C. De Novo Localization of 2C19 Polymorphisms 
20 2C19 polymorphisms are identified and localized to 

specific nucleotides by comparison of nucleic acids from poor 
metabolizing individuals with nucleic acids from extensive 
metabolizers. The comparison can be initiated directly at the 
genomic level. If intron primers are known, individual exons 
25 and intron/ exon junctions of 2C19 can be amplified from 

genomic DNA. These fragments can be sequenced directly or 
analyzed by single-stranded conformational analysis to 
indicate the presence of a polymorphism and then analyzed by 
sequencing. 

3 0 Comparison is sometimes initiated at the cDNA level 

because of the shorter size of cDNA (about 1750 bp) relative 
to genomic DNA (about 55 kbp) . cDNA is amplified from liver 
samples of individuals known to have phenotypic S-mephenytoin 
metabolic deficiencies, and the cDNA sequence is compared with 

35 the wildtype sequence shown in SEQ. ID. No. 2.. Often, the 

full-length cDNA is amplified. An initial comparison can be 
performed by • single-stranded conformational analysis to 
indicate the existence of a polymorphism. The polymorphism is 



BNSOOClD: <WO 9530766A1 J_> 



WO 95/30766 



PCTAJS95/05744 



38 

then localized by sequence analysis indicating the site of 
mutations in cDNA. Of course, the amplification product can 
also be sequenced directly without prior conformational 
analysis. Having localized a mutation in cDNA, a 
corresponding region of genomic 2C19 DNA is amplified. The 
genomic DNA is usually amplified from primers spanning the 
mutation. At least one of the primers for this amplification 
usually comprises a subsequence of the cDNA sequence proximate 
(i.e., within 25-200 bp of the cDNA mutation). Primers can 
also comprise subsequences of genomic 2C19 DNA that have 
already been sequenced, subsequences from related genomic 
sequences, such as 2C18 or 2C9 (see de Morais et al. , Biochem. 
Biophys. Res. Commun. 194:194-201 (1993)) (incorporated by 
reference in its entirety for all purposes) , or can be random. 
An amplified genomic fragment spanning the portion of the 
coding region in which the cDNA polymorphism occurs is 
sequenced and compared with the corresponding region from a 
2C19 sequence from an individual exhibiting extensive S— 
mephenytoin 4 • -hydroxylase metabolism to identify the locus of 
the genomic mutation. 

In some instances, there will be a simple 
relationship between genomic and cDNA mutations. That is, a 
single base change in a coding region of genomic DNA can give 
rise to a corresponding mutated codon in the cDNA. In other 
instances, the relationship between genomic and cDNA mutations 
is more complex. Thus, for example, a single base change in 
genomic DNA creating an aberrant splice site can give rise to 
deletion of a substantial segment of cDNA in a poor 
metabolizing individual. 

D. The 681 and 636 Polymorphisms 

The principal mutation in individuals deficient in 
the S-mephenytoin 4 • -hydroxylase activity is designated the 
681 polymorphism. See Example 7. The 681 polymorphism 
results from a single-base mutation in genomic 2C19 DNA at 
nucleotide position 681 of the coding region. A nucleotide in 
a coding (i.e., exonic) region of genomic 2C19 DNA is 
designated the same number as the corresponding nucleotide in 
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the cDNA sequence shown in SEQ. ID. No. 2, when the genomic 
coding sequence is maximally aligned with the cDNA sequence. 
The 681 polymorphism results in a G/A transposition at 
nucleotide 681 of the coding region. Homozygous mutations at 
5 this position occur in about 70% of individuals having a low- 
producing (i.e., defective) S-mephenytoin 4 * -hydroxylase 
phenotype. The mutation is inherited in an autosomal 
recessive fashion. Thus, individuals heterozygous in this 
mutation usually exhibit normal (i.e,, extensive S-mephenytoin 

10 activity) . Fortuitously, the mutation confers two distinct 
properties that facilitate its identification* In genomic 
DNA, the polymorphism results in loss of several restriction 
enzyme sites (e.g., Smal) and acquisition of other restriction 
sites (e.g., EcoRII) site in mutant individuals compared with 

15 wildtype individuals. These restriction sites include the 681 
nucleotide. In mRNA or cDNA, the 681 mutation results in a 
deletion of 4 0 bp spanning nucleotides 643-682 of the wildtype 
cDNA sequence shown in Fig. 12. The deletion is the 
consequence of an altered splice pattern stemming from the 

20 presence of the 681 polymorphism in genomic DNA. 

A second polymorphism is designated the 636 
polymorphism. See Example 8. The 63 6 polymorphism results 
from a single-base mutation in genomic 2C19 DNA at nucleotide 
position 636. The 636 polymorphism results in a G/A 

25 transposition thereby introducing a premature stop codon into 
2C19 mRNA. The mutation is easily be recognized by the loss of 
e.g., a BamHI site in both genomic and cDNA and acquisition of 
e.g., a Hinfl site. The mutation is inherited in an autosomal 
recessive fashion. Homozygous mutations at nucleotide 636 

3 0 account for about 10% of low-producing phenotypes in 

Orientals. Heterozygous individuals having one allele 
defective in the 63 6 polymorphism and the other allele 
defective in the 681 polymorphism account for all or nearly 
all of the remaining 15% of low producing Oriental 

35 individuals. Thus, the 681 and 63 6 polymorphisms collectively 
account for all, or nearly all, low producing phenotypes in 
Orientals. 
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In Caucasians, the 63 6 polymorphism is less 
prevalent and some low producing individuals probably have a 
mutation at a locus other than nucleotide 681 or 636 of the 
coding sequence. Conceivably, a few mutations might occur in 
5 other genes that exert regulatory control over the 2C19 gene. 
However, most, if not all, of the remaining mutations probably 
result from additional polymorphisms in the 2C19 gene. 

E. Screening Assays for Defined Mutations 
10 The invention provides assays that permit large- 

scale screening of individuals for the presence of defined 
mutations. Of course, detection of the 681 and 63 6 mutations, 
which account for all or nearly all deficiencies in Orientals 
and about 75% of deficiencies in Caucasians, is of primary 
15 importance. An assay on an individual under test is often 

performed in parallel with control assays on DNA samples from 
subjects of known phenotype (i.e., extensive or poor 
metabolizer of S-mephenytoin) . 

20 1- Genomic Assays 

Assays are preferably performed on a genomic 
substrate because of the ready availability of tissue samples 
containing genomic DNA. 

25 a. Amplification of Segments Spanning a 

Defined Mutation 

A preferred strategy for analysis entails 
amplification of a DNA sequence spanning previously localized 
polymorphism (s) (e.g., the 681 and/or 636 polymorphisms). 

30 Amplification of such a sequence can be primed from forward 

and reverse primers that hybridize to a 2C19 gene on opposite 
sides of a mutation (e.g., the 681 mutation, but which do not 
hybridize to the mutated nucleotide itself) . That is, for 
detection of the 681 polymorphism, the forward primer 

35 hybridizes upstream or 5' to the 681 nucleotide and the 

reverse primer hybridizes downstream or 3 f to this nucleotide. 
Similarly, for detection of the 636 polymorphism, the forward 
primer hybridizes upstream or 5 • to the 63 6 nucleotide and the 
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reverse primer hybridizes downstream or 3 1 to this nucleotide. 
For simultaneous analysis of 63 6 and 681 polymorphisms, the 
forward primer hybridizes upstream or 5' to the 636 nucleotide 
and the reverse primer hybridizes downstream or 3 ' to 
5 nucleotide 681. 

The forward primer is sufficiently complementary to 
the antisense strand of a 2C19 DNA sequence to hybridize 
therewith and the reverse primer is sufficiently complementary 
to the sense strand of the 2C19 sequence to hybridize 

10 therewith. The primers usually comprise first and second 

subsequences from opposite strands of a double-stranded 2C19 
DNA sequence. Isolated points of mismatch between a primer 
and a corresponding 2C19 subsequence can usually be tolerated 
but are not preferred. It is particularly important to avoid 

15 mismatches in the two nucleotides at the 3 • end of the primer 
(especially the terminal nucleotide). 

Because allelic variants of 2C19 exhibit at least 
about 97% sequence identity to each other, it is not critical 
which variant is selected as a source of subsequences for 

20 incorporation into forward and reverse primers. For example, 
suitable subsequences can be obtained from the genomic 2C19 
sequence defined as wildtype in Figs. 15-17. Fig. 15 provides 
genomic sequence immediately flanking the 681 mutation, and 
Figure 16 provides more distal flanking sequences. Figure 17 

25 provides genomic sequence flanking the 63 6 mutation. These 
figures provide sufficient sequence for selection of a 
multitude of paired primers for amplification of a sequence 
spanning the 681 and /or 636 polymorphisms. Although there is 
no apparent advantage for doing so, additional genomic 

30 sequence flanking the regions already sequenced could easily 
be determined by PCR-based gene walking. See Parker et al., 
Nucl. Acids Res. 19:3 055-3 060. A specific primer for the 
sequenced region is primed with a general primer that 
hybridizes to the flanking region. 

35 Forward primers often comprise about 10-50 and 

preferably 15-3 0 contiguous nucleotides from the wildtype 2C19 
sequences shown in Figs. 15-17 (which is the coding or sense 
sequence) . Reverse primers often comprise about 10-50 or 15- 
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3 0 nucleotides from the complement of the wildtype 2C19 
sequence shown in Figs. 15-17. The complement of the sequence 
shown in Figs. 15-17 is also referred to as the antisense 
sequences. A primer (or its complement) preferably exhibits 
5 100% sequence identity with a corresponding 2C19 subsequence 
to which it hybridizes over a window of about 15-3 0 bp. For 
amplification of the 681 polymorphism, forward primers 
preferably comprise a segment of contiguous nucleotides from 
the fourth intronic region and reverse primers a segment of 

10 contiguous nucleotides from the fifth exonic or intronic 

region. For amplification of the 63 6 polymorphism , forward 
primers preferably comprise a segment of contiguous 
nucleotides from the third intronic region and reverse primers 
a segment of contiguous nucleotides from the fourth intronic 

15 region. For amplification of both the 636 and 681 

polymorphisms, forward primers preferably comprise a segment 
of contiguous nucleotides from the third intronic region and 
reverse primers a segment of contiguous nucleotides from the 
fifth exonic region or fifth intronic region. See Figure 19, 

20 As noted above, the spacing of the subsequences is not 
critical, but a separation of about 50-2000 bp. For 
simultaneous amplification of the 636 and 681 mutations, the 
spacing is typically 1000-1500 bp. For amplification of 
either mutation alone, a spacing of about 4 00 bp is typical. 

25 Preferred primers exhibit perfect sequence identity 

to 2C19 and lesser sequence identity to corresponding regions 
of related genes, such as 2C9 and 2C18. Such primers are 
designed by comparison of the wildtype 2C19 sequence shown in 
Fig. 15-17 with corresponding sequences from, v 2C9 and 2C18 

3 0 described by de Morais et al., supra* In general, sequence 
divergence between the three genes is expected to be greater 
in intronic sequences. An exemplary pair of primers for 
amplifying a segment spanning the 681 mutation is described in 
Example 7. A forward primer, 5 '-AATTACAACCAGAGCTTGGC-3 ' (SEQ. 

35 ID. No. 55) , exhibits perfect sequence identity to a 

subsequence from the wildtype 2C19 sense strand within 

intron 4. A reverse primer 5 '-TATCACTTTCCATAAAAGCAAG-3 ' 

( (SEQ. ID. No. 56) exhibits perfect sequence identity to the 
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antisense strand of the wildtype 2C19 sequence within exon 5. 
The amplification product from these primers has a length of 
169 bp. An exemplary pair of primers for amplifying a segment 
spanning the 63 6 mutation is described in Example 8. A 
5 forward primer, 5 '-TATTATCTGTTAACTAATATGA-3 ' (SEQ. ID. No. 57) 
exhibits perfect sequence identity to a subsequence from the 
wildtype 2C19 sense strand within intron 3. A reverse primer 
5 ACTTCAGGGCTTGGTCAATA— 3 ' (SEQ. ID. No. 58) exhibits perfect 
sequence identity to the antisense strand of the wildtype 2C19 

10 sequence within intron 4. The amplification product from 
these primers has a length of 329 bp. 

Having amplified a segment of a 2C19 gene known to 
span a polymorphism, a variety of assays are available for 
determining whether a mutation is present in an individual 

15 under test. A generally applicable, but relatively laborious 
assay, is to sequence the amplified fragment across the 
polymorphic locus and compare the resulting sequence with "the 
wildtype 2C19 sequence shown in Fig. 15-17. 

A simpler assay, but one applicable to only certain 

20 mutations, is to compare the size or restriction profile of 
the amplified segment, optionally in comparison with a 
corresponding wildtype 2C19 segment. For the 681 
polymorphism, restriction analysis provides a rapid and clear- 
cut means of identifying a mutant allele. The 681 

25 polymorphism results in loss of a Smal site and acquisition of 
an EcoRII site in mutant alleles. Thus, Smal digestion of a 
wildtype allele produces an extra band compared with a mutant 
allele. For the amplification product obtained using the 
exemplified primers discussed above, Smal digestion of the 

30 wildtype product yields fragments of 120 and 49 bp, whereas 
the mutant amplification product remains uncut yielding a 
single fragment of 169 bp. In individuals homozygous for the 
wildtype allele, only the 12 0 bp and 49 bp bands are present. 
In individuals homozygous for the mutant allele, only the 169 

35 bp band is present. In heterozygotes , all three bands (i.e., 
169, 120 and 49 bp) are present. The bands can usually be 
detected by agarose or acrylamide gel electrophoresis and 
ethidium bromide staining. If greater sensitivity is needed. 
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the amplification product is labelled and the bands detected 
by / e -5 r -/ autoradiography. Of course, the assay can also be 
performed using an isoschizomer of Smal with identical 
results. The assay can also be performed by digesting with 
5 EcoRII or an isoschizomer thereof. In this case, one obtains 
a mirror image of the results obtained for Smal digestion, 
because the mutant 2C19 allele contains an additional EcoRII 
site relative to the wildtype allele. As a quality control 
measure, both Smal and EcoRII digestions can be performed on 

10 separate aliquots of a test sample. Of course, any other 
enzyme that recognizes a site that includes the 681 
polymorphism can also be used. For example, alternatives to 
Smal (i.e., that cleave only the wildtype allele) include 
Aval, Mspl, Neil, ScrFI and TspEI) . 

15 The 63 6 polymorphism can be similarly analyzed by 

digestion with e.g., BamHI. BamHI digestion of a wildtype 
allele produces an extra band compared with a mutant allele. 
For the amplification product obtained using the exemplified 
primers discussed above, BamHI digestion of the wildtype 

20 product yields fragments of 233 and 96 bp, and digestion of 
the mutant product yields a single fragment of 329 bp. In 
individuals homozygous for the wildtype allele, only the 233 
bp and 96 bp bands are present. In individuals homozygous for 
the mutant allele, only the 329 band is present. In 

25 heterozygotes, all three bands are present. Of course, other 
enzymes that cut the wildtype allele at the polymorphic locus 
but not the 63 6 mutant allele, or vice versa, can also be 
used. For example, alternatives to BamHI include Alwl, BsaJI, 
BstVI, Dpnl, EcoRII, NlalV, Sau3AI and ScrFI. Enzymes that 

30 recognize a site on the mutant allele including nucleotide 

636, but do not recognize the wildtype allele, include Hinfl 
and Tfil. 

For simultaneous detection of the 681 and 636 
polymorphisms after amplification of a fragment spanning both 
35 polymorphism, the DNA can be double digested with two of the 
enzymes mentioned above. One enzyme should distinguish 
between the mutant 681 allele from a wildtype allele and the 
other should distinguish the mutant 63 6 allele from a wildtype 
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allele. For example, double digestion with Smal and BamHI is 
suitable. The double digestion generates six different 
restriction patterns corresponding to the six possible 
genotypes: wt/wt, wt/681, vt/636, 681/681, 636/636 and 
5 681/636. See Figure 19. 

In another assay, amplification products are 
subjected to single-stranded conformational analysis . See, 
e.g., Hayashi, PCR Methods & Applications 1, 34-38 (1991); 
Orita, Proc. Natl. Acad. Sci. USA 86, 2766-2270 (1989); Orita 

10 et al., Genomics 5, 874-879 (1989). This method is capable of 
detecting many single base mutations in DNA fragments up to 
200 bp irrespective whether the mutation causes a change in 
restriction fragment profile. In this method, the PCR 
reaction is performed using at least one labelled nucleotide 

15 or labelled primer to obtain a labelled amplified fragment. 
The amplification product is then denatured and the strands 
resolved by polyacrylamide gel electrophoresis under 
nondenaturing conditions. Mutations are detected by altered 
mobility of separated single strands. 

20 

b. Selective Amplification _of an Allelic 

Variant 

An alternative method for detecting defined 
mutations in a 2C19 gene employs a selective strategy whereby 

25 a wildtype allele is amplified without amplification of a 
mutant allele (or vice versa) • This is accomplished by 
designing one of the primers to hybridize to a subsequence 
overlapping a defined polymorphism (for example, the 681 
polymorphism) . Such a primer can be designed to hybridize to 

30 one polymorphic allele without hybridizing to the other. 
Thus, when such a primer is paired with a second primer 
hybridizing distal to the polymorphic region, amplification 
will only occur for one polymorphic allele. 

For diagnosis of the 681 polymorphism, selective 

35 amplification of the wildtype allele of 2C19 can be 

accomplished using a forward primer that has about 10-50, and 
usually 15-30 nucleotides from the wildtype 2C19 sequence 
shown in Fig. 15 or 16, including nucleotide 681. Such a 
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forward primer when paired with any suitable reverse primer 
downstream from nucleotide 681 (i.e., sufficiently 
complementary to the sense strand of 2C19 to hybridize 
therewith) can be used to amplify selectively the wildtype 
5 allele without amplifying a mutant allele. The selectivity 
between amplification of wildtype and mutant alleles is 
greatest when the 681 nucleotide occurs near, or preferably, 
at the 3* end of the primer. Because the extension forms from 
the 3 1 end of the primer, a mismatch at or near this position 

10 is most inhibitory of amplification. The same result can be 
achieved by using a reverse primer that has about 10-50 or 
usually 15-3 0 contiguous nucleotides from the complement of 
the wildtype 2C19 sequence shown in Fig. 15 or 16 (i.e., the 
antisense strand) including the nucleotide at position 681. 

15 Such a reverse primer can be paired with any suitable forward 
primer sufficiently complementary to a subsequence of the 
antisense strand of the 2C19 gene upstream from nucleotide 681 
to hybridize therewith. The 681 nucleotide should again be at 
or near the 3* end of the reverse primer* 

20 Selective amplification of a 681 mutant allele is 

accomplished by an analogous strategy in which primers are 
designed to hybridize to the mutant allele without hybridizing 
to the wildtype. A suitable forward primer for amplification 
comprises about 10-50 or usually 15-30 contiguous nucleotides 

25 from the mutant 2C19 sequence shown in Fig. 15 of 16 (i.e., 

the sense strand) . The forward primer can be paired with any 
suitable reverse primer sufficiently complementary to the 
sense strand of a downstream 2C19 subsequence to hybridize 
therewith. Alternatively, the same result can be achieved 

30 using a reverse primer comprising about 10-50 or 15-3 0 

contiguous nucleotides from the complement of the mutant 2C19 
sequence shown in Fig. 15 or 16 (i.e., the antisense strand). 
Such a reverse primer can be paired with any suitable forward 
primer sufficiently complementary to the antisense strand of 

35 an upstream 2C19 subsequence to hybridize therewith. 

For diagnosis of the 63 6 polymorphism, selective 
amplification of the wildtype allele of the 2C19 allele can be 
accomplished using a forward primer that has about 10-50 , and 
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usually 15-3 0 nucleotides from the wildtype 2C19 genomic 
sequence shown in Fig, 17, including nucleotide 63 6. Such a 
forward primer when paired with any suitable reverse primer 
downstream from nucleotide 636 (i.e., sufficiently 
5 complementary to the sense strand of 2C19 to hybridize 

therewith) can be used to amplify selectively the wildtype 
allele without amplifying a mutant allele. The 63 6 nucleotide 
usually occurs near, or preferably, at the 3' end of the 
primer. The same result can be achieved by using a reverse 

10 primer that has about 10-50 or usually 15-30 contiguous 

nucleotides from the complement of the wildtype 2C19 genomic 
sequence shown in Fig. 17 (i.e., the antisense strand) 
including the nucleotide at position 636. Such a reverse 
primer can be paired with any suitable forward primer 

15 sufficiently complementary to a sequence of the antisense 
strand of the 2C19 gene upstream from nucleotide 63 6 to 
hybridize therewith. The 63 6 nucleotide should again be at or 
near the 3' end of the reverse primer. 

For selective amplification of a 63 6 mutant allele a 

2 0 suitable forward primer for amplification comprises about 10- 
50 or usually 15-3 0 contiguous nucleotides including 
nucleotide 63 6 from the mutant 2C19 genomic sequence shown in 
Fig. 17 (i.e., the sense strand). The forward primer can be 
paired with any suitable reverse primer sufficiently 

25 complementary to the sense strand of a 2 CI 9 genomic 

subsequence downstream from nucleotide 636 to hybridize 
therewith. Alternatively, the same result can be achieved 
using a reverse primer comprising about 10-50 or 15-30 
contiguous nucleotides including nucleotide "636 from the 

30 complement of the mutant 2C19 sequence shown in Fig. 17 (i.e., 
the antisense strand) . Such a reverse primer can be paired 
with any suitable forward primer sufficiently complementary to 
the antisense strand of a 2C19 subsequence upstream from 
nucleotide 63 6 to hybridize therewith. 

35 Following amplification, the sample under test is 

characterized as wildtype or mutant by the presence or absence 
of an amplification product. With a primer designed for 
selective amplification of the wildtype allele, the presence 
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of an amplification product is indicative of that allele and 
the absence of an amplification product indicative of a mutant 
allele. The converse applies for primers designed for 
selective amplification of a mutant allele. In preferred 
5 assay, a sample is divided into two aliquots, one of which is 
amplified using primers for wildtype allele amplification,- the 
other of which is amplified using primers appropriate for 
mutant allele amplification. The presence of an amplification 
product in one but not both of the aliquots indicates that the 

10 individual under test is either wildtype or a homozygous for 
the mutation (depending on aliquot in which the amplification 
product occurred) . The presence of amplification product in 
both aliquots indicates that the individual is heterozygous. 
The absence of an amplification product in both aliquots would 

15 indicate either the absence of a 2C19 gene or a quality 

control problem in the amplification procedure requiring that 
the assay be repeated. Coamplif ication of a second known 
standard human gene using a second set of primers can aid in 
distinguishing between these possibilities. If both bands aire 

20 missing, the problem is probably quality control, while 

amplif ication of only the standard gene is suggestive that the 
CYP2C19 gene may be deleted. 

The presence or absence of amplification products 
can be detected by gel electrophoresis. Gels are usually 

25 visualized by ethidium bromide staining. However, if greater 
sensitivity is required fragments can be labelled in the 
course of amplification. Amplified fragments can be 
electrophoresed directly or can be cut with any restriction 
enzyme that releases fragments of a convenient size from the 

30 amplification products. For the simultaneous analysis of 

multiple samples, the dot-blot method may be advantageous. In 
the dot blot method, multiple unlabelled amplification 
mixtures are bound to discrete locations on a solid support, 
such as a membrane. The membrane is incubated with labeled 

35 probe under suitable hybridization conditions, the 

unhybridized probe removed by washing, and the filter 
monitored for the presence of bound probe. 
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c. Southern Blotting 

For polymorphic mutations resulting in loss or 
acquisition of a restriction site (such as the 681 and 63 6 
polymorphisms) , samples of genomic DNA can also be analyzed by 
5 Southern blotting without the need for prior amplification. 

The DNA is digested with an enzyme that cuts a wildtype allele 
but not a mutant allele or vice versa (e.g. , BamHI, Smal, 
EcoRlI or Hinfl, or isoschizomers of any of these)* For 
analysis of the 681 polymorphism, digestion with Sznal or 

10 isoschizomers results in an additional fragment from the 

wildtype allele compared with the mutant allele* Digestion 
with EcoRII or isoschizomers results in an additional fragment 
from the mutant allele. Digestion products are detected with 
a 2C19 probe. For analysis of the 63 6 polymorphism, digestion 

15 with BamHl or isoschizomers results in an additional fragment 
from the wildtype allele compared with the mutant allele. 
Digestion with Hinfl results in an additional fragment from 
the mutant allele. The probe can be any segment of a 2C19 DNA 
sequence that includes the polymorphism and extends for at 

20 least about 20 nucleotides on either side. 

2 . cDNA Assays 

Defined polymorphisms can also be detected by 
analysis of cDNA by similar strategies to those employed for 
25 genomic DNA. However, the primers appropriate for 

amplification procedures are not necessarily interchangeable 
for the two substrates. Suitable primers for analysis of the 
681 and 636 polymorphisms in cDNA are described below. 

30 a. Amplification of Segments Spanning a 

Defined Mutation 

The 681 polymorphism in genomic DNA results in 
a 40 bp deletion of cDNA comprising nucleotides 643-682 of the 
wildtype 2C19 cDNA or genomic sequence shown in Fig* 12. The 

35 forward primer and reverse primers are therefore designed to 
hybridize to 2C19 subsequences on opposite sides of this 
deletion. Thus, for example, a forward primer can hybridize 
to the antisense strand of a 2C19 sequence upstream from 
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nucleotide 64 3 of the coding region. Such a forward primer 
should be paired with a reverse primer that hybridizes to the 
sense strand of the 2C19 sequence downstream from nucleotide 
682. Nucleotides in a 2C19 DNA sequence are designated the 
5 numbers of corresponding nucleotides in the wildtype cDNA 

sequence shown in SEQ. ID. No. 2 (or Fig, 12, which shows a 
subsequence of SEQ. ID. No. 2) , when the sequences are 
maximally aligned. Preferably, the forward primer comprises 
about 10-50 or 15-3 0 contiguous nucleotides upstream of 

10 nucleotide 64 5 from the wildtype 2C19 cDNA sequence shown in 
Fig. 12 or SEQ. ID. No. 2. Analogously, the reverse primer 
preferably comprises about 10-50 or 15-30 contiguous 
nucleotides from the complement of the wildtype 2C19 cDNA 
sequence shown in Fig. 12 or SEQ. ID. No. 2 downstream from 

15 nucleotide 682 of the coding region. For example, a forward 
primer comprising 5 ■ -ATTGAATGAAAACATCAGGATTG-3 1 (SEQ. ID. 
No. 59) and a reverse primer comprising 5'- 

GTAAGTCAGCTGCAGTGATTA-3 ' (SEQ. ID. No. 60) form a suitable 
pair. The amplification product from such primers is 40 bp 

20 longer for the wildtype 2C19 cDNA sequence than for the 681 
mutant sequence. 

For detection of the 63 6 polymorphism, the forward 
primer and reverse primers are designed to hybridize to 2C19 
subsequences on opposite sides of nucleotide 636. Thus, for 

25 example, a forward primer can hybridize to the antisense 

strand of a 2C19 sequence upstream from nucleotide 636 of the 
coding region. Such a forward primer should be paired with a 
reverse primer that hybridizes to the sense strand of the 2C19 
sequence downstream from nucleotide 636 (SEQ. ID. No. 2 or 

30 Fig. 12) . Preferably, the forward primer comprises about 10- 
50 or 15-30 contiguous nucleotides upstream of nucleotide 636 
from the wildtype 2C19 cDNA sequence shown in Fig. 12 or SEQ. 
ID. No. 2. Analogously, the reverse primer preferably 
comprises about 10-50 or 15-30 contiguous nucleotides from the ^ 

35 complement of the wildtype 2C19 cDNA sequence shown in Fig. 12 
or SEQ. ID. No. 2 downstream from nucleotide 636 of the coding 
region. 
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For simultaneous detection of the 63 6 and 681 
polymorphisms, the forward primer should be as described for 
detection of the 636 polymorphism and the reverse primer as 
described for detection of the 681 polymorphism. These 
5 primers will amplify a segment of DNA spanning both the 636 
and 681 polymorphisms. 

Amplification products are usually analyzed by gel 
electrophoresis. The products can be analyzed uncut or can be 
cleaved with any restriction enzyme having a site in the 

10 amplification product. For detection of the 681 polymorphism, 
Smal and its isoschizomers are particularly useful because of 
the presence of a restriction site present in wildtype 2C19 
DNA that is not present in the mutant form. See Fig. 12. 
Similarly, BamHI and its isoschizomers are particularly useful 

15 for detection of the 63 6 polymorphism. Analysis of fragments 
allows distinction between wildtype, homozygous and 
heterozygous mutations as discussed for the corresponding 
genomic assay. 



20 b. Selective Amplification of an Allelic 

Variant 

For analysis of the 681 polymorphism, selective 
amplification of the wildtype variant is achieved by selecting 
a forward or reverse primer that overlaps nucleotides 64 3-682 

25 of the wildtype 2C19 cDNA sequence (Fig. 12) . This segment of 
nucleotides is not present in a mutant allele. Thus, a primer 
hybridizing to this segment of the wildtype allele will not 
hybridize to the mutant allele. Accordingly, such primers can 
be used to prime amplification of the wildtype allele without 

30 priming amplification of the mutant allele. For example, a 
forward primer that hybridizes to the complement of the 
wildtype 2C19 cDNA sequence shown in Fig. 12 between 
nucleotides 643-682 without hybridizing to the complement, of 
the mutant 2C19 DNA sequence shown in Fig. 12 is suitable. 

3 5 Such a forward primer can be paired with any suitable reverse 
primer sufficiently complementary with a downstream 
subsequence of the sense strand of the 2C19 cDNA to hybridize 
therewith. 
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Alternatively, a reverse primer is designed that 
hybridizes to the wildtype 2C19 cDNA sequence shown in Fig. 12 
between nucleotides 643 and 682 without hybridizing to the 
mutant 2C19 cDNA sequence shown in Fig. 12. Such a reverse 
5 primer can be paired with any suitable forward primer 

sufficiently complementary with an upstream subsequence of the 
antisense strand of the 2C19 cDNA to hybridize therewith. 

Primers for selective amplification of the mutant 
allele can also be designed. A suitable primer hybridizes to 
10 two 2C19 subsequences, of about 1-50/ 5-3 0 or 10-20 

nucleotides , which subsequences are separated by nucleotides 
643-682 in the wildtype sequence, but which are contiguous in 
the mutant sequence. Such primers hybridize to mutant 2C19 
cDNA sequences without hybridizing to wildtype sequences. For 
15 example, a forward primer comprising a subsequence of 

nucleotides 633-642 of the wildtype 2C19 cDNA sequence shown 
in Fig. 12 joined to a second subsequence of nucleotides 684- 
693 of this sequence is suitable. This primer can be paired 
with any suitable reverse primer sufficiently complementary to 
20 a downstream subsequence of the sense strand of the 2C19 cDNA 
to hybridize therewith. 

For analysis of the 636 polymorphism, primers can 
designed using the same strategy as discussed for selective 
amplification of genomic DNA except that the primers, which 
25 include nucleotide 636, are formed from nucleotide segments 
from cDNA rather than genomic sequences. 

Amplification products are analyzed using the same 
methods as described for corresponding genomic amplification 
products. 

30 

F. Diagnostic Kits 

The invention also provides kits comprising useful 
components for practicing the diagnostic methods of the 
invention. The kits comprise at least one of the primers 
35 discussed above. Kits usually contain a matched pair of 

forward and reverse primers as described above for amplifying 
a segment encompassing the 681 and /or the 63 6 polymorphism. 
Some kits contain two matched pairs of primers, e.g., one pair 
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Facility, Stanford Research Institute, Life Sciences Division, 
Menlo Park, CA. Restriction end onuc leases were purchased from 
Pharmacia LKB Biotechnology, Inc. (Piscataway, NJ) . [or- 33 ] 
dCTP (3000 Ci/mmol) and [t- 32 P] ATP (500 Ci/mmol) and [or- 32 S] 
dATP (650 Ci/mmol) were from Amersham Corp. (Arlington 
Heights, IL) . All other reagents were of the highest quality 
available. 

Conditions . Hybridization and washing conditions 
for screening libraries with random-labeled cDNAs for 2C13 (g) 
or 254c used the same solutions as described for act in, but . 
were performed at nonstringent temperatures (42 °C). 
Conditions for hybridization of clones with T300R were 
identical with those described above. Hybridization of cDNA 
clones with M300R (recognizes 2C9, 2C10, and 2C19) (5*- 
ACTTTTCAATGTAAGCAAAT-3 * ) (SEQ. ID. No. 17) was identical 
except that for each oligomer the hybridization temperature 
and the high-stringency wash were 5°C below the calculated 
melting temperatures. 

Example 1: Construction and Screening of Human Liver cDNA 
Libraries 

Two cDNA libraries were constructed from human 
livers 860624 and S33, which differed phenotypically in the 
hepatic content of P450 HLx (2C8) (SEQ. ID. No. 8) . Several 
partial cDNA clones were found but no full-length clones. 

A second cDNA library (from a liver phenotypically 
high in HLx) was then screened. Eighty-three essentially 
full-length (>1.8 kb) clones belonging to the 2C subfamily 
were isolated from this library. These include full-length 
clones for two additional new members of the 2C subfamily. 

The majority of the cDNAs characterized in the high- 
HLx library (60%) were one of two allelic variants of 2C9, 
while 35% represented 2C8 (SEQ. ID. No. 8) . Two new genes 
were identified (two allelic variants of 2C18 and 2C19) . 

The two cDNA libraries from individuals 
phenotypically high and low in HLx were examined to determine 
whether a variant mRNA for 2C8 (SEQ. ID. No* 8). was 
responsible for the polymorphic expression of HLx and to 
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identify additional members of the 2C subfamily. No clones 
for 2C8 (SEQ. ID. No. 8) were isolated from the individual 
phenotypically high individual. Two allelic variants for 2C9 
were isolated. In addition, full-length cDNAs for two 
5 additional new members (2C18 and 2C19) were isolated. These 
new members of the 2C subfamily were expressed in COS-1 cells 
and shown to be immunochemically distinct from HLx and 2C9, 
and 2C18 metabolized racemic mephenytoin. 

Total human liver RNA was prepared by the guanidine 

10 hydrochloride method (Cox, Methods Enzymol. 12:120-129 (1968)) 
from two human livers either low (860624) or high (S33) in HLx 
as identified by immunoblot analysis. Poly (A+) RNA was then 
isolated by two passages over an oligo(dT) -cellulose column 
(Aviv et al., Proc. Natl. Acad. Sci. U.S.A. 69:1408-1412 

15 (1972)). The low-HLx cDNA library was prepared by Stratagene 
Cloning systems (La Jolla, CA) , and the double-stranded cDNA 
was treated with SI nuclease. Following the addition of EcoRI 
linkers, the double-stranded cDNA was size-fractionated on a 
CL-4B Sepharose column. The largest fraction was ligated into 

2 0 XZAPII and then transfected into XLl-Blue. The high-HLx cDNA 

library was constructed following the methods of Watson et 
al., in DNA Cloning (Glover, D.M. , Ed.) 1:79-88, IRL Press, 
Washington, D.C. (1985)). Double-stranded cDNA was ligated to 
EcoRI linkers, size-fractionated on an agarose gel (1.8-2.4 
25 kb) , and then ligated into XZAPII (Stratagene) and transfected 
into XLl-Blue. 

The low-HLx library was screened under conditions of 
low stringency with a 32 P-labeled rat P450 2C13 cDNA probe and 
with oligonucleotides for human 2C8 (SEQ. ID. No. 8) (T300R) 

3 0 (5 ' -TTAGTAATTCTTTGAGATAT-3 ' ) (SEQ. ID. No. 18) and 2C9 (M3 00R) 

( 5 ' -CTGTTAGCTCTTTCAG CCAG- 3 • ) ( SEQ . ID . No . 19 ) . The high-HLx 
library was screened under conditions of low stringency using 
a 32 P-labeled 254C cDNA probe derived from the first library 
and M300R (2C9) . Positive clones were isolated, transfected 
3 5 into XLl-Blue, and excised into the plasmid Bluescript, 
according to Stratagene' s excision protocol. 

Screening the cDNA library constructed from a low- 
HLx individual with a cDNA for rat 2C13 under nonstringent 
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conditions and with oligonucleotide probes specific for 2C8 
(SEQ. ID. No- 8) and 2C9 yielded several clones for 2C9 and a 
partial DNA, clone 2 54c,. which now appears to be an 
incompletely characterized splice variant of the P450 2C 
5 subfamily. None of the clones identified in this library were 
full-length. Clone 186 was identical with but 25 base pairs 
longer than MP-4, a 2C9 clone previously described by Ged et 
al. (1988). 

Approximately 40000 plaques were then screened from 
10 the library from liver S3 3 with the cDNA for 254c under non- 
stringent conditions and with an oligonucleotide probe 
specific for 2C9. Eighty-three essentially full-length 2C 
clones (>1.8 kb) were isolated, purified, and partially or 
completely sequenced (Table I) . Of these, 29 clones were 
15 found to encode cytochrome P450 2C8 (SEQ. ID. No. 8) . One 
clone (7b) of 2C8 (SEQ. ID. No. 8) was isolated which was 
similar to Hpl-l and Hpl-2 reported by Okino et al.(1987), but 
different by having a tyrosine at position 13 0 instead of an 
asparagine and an isoleucine at 264 instead of a methionine. 
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TABLE I 

Distribution of P4 50 2C cDNA Clones from 
Human Liver S3 3* 



No. of Clones % Distribution 



2C8 (SEQ. ID. No. 8) 29 35 
2C9 

65 (SEQ. ID. No. 10) 39 47 

25 {SEQ. ID. No. 4) 11 13 

2C10 0 0 

2C18 

29c (SEQ. ID. No. 6) 1 1.2 

6b (SEQ ID. No. 12) 2 2.5 

2C19 (11A) (SEQ ID No. 2) 1 1.2 

Total 83 100 



* Clones were classified by hybridization with specific 
oligonucleotide probes and partial sequencing* 

There are a number of polymorphisms in the human 
CYP2C subfamily. These include variations in the hepatic 
levels of HLx (Wrighton et al. , Arch. Blochem. Blophys . 
306:240-245 (1987)) and metabolic variations in the hepatic 
metabolism of S-mephenytoin. The molecular basis for these 
polymorphisms has not been characterized. 2C8 (SEQ. ID. No. 
8) appears to encode the protein for HLx on the basis of its 
N-terminal amino acid sequence (Okino et al., t7. Biol. Chem. 
262:16072-16079 (1987); Wrighton et al., supra; Lasker et al., 
Biochem* Blophys. Res . Commun . 148:232-238 (1987)). 

Example 2: Sequence Analysis 

The Bluescript plasmids containing the positive cDNA 
inserts from the low-HLx library were purified by CsCl 
gradients, while the plasmids containing cDNA inserts from the 
high-HLx library were purified by using Qiagen plasmid 
purification kits (Qiagen, Inc. , Studio city, CA) . The 
double-stranded cDNA inserts were sequenced by the dideoxy 
chain termination method reported in Sanger et al., J. Mol. 
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Biol. 162:729-773 (1982), using Sequenase kits (U.S. 
Biochemical Corp., Cleveland, OH). The full-length clones 65 
(SEQ. ID. No. 10), 25 (SEQ. ID. No. 4), 7b, 11a (SEQ. ID. 
No. 2), 29c (SEQ. ID. No. 6) and 6b (SEQ. ID. No. 12) were 
5 sequenced completely in both directions with primers spaced 
approximately 20 bases apart. The remaining positive clones 
from the high-HLx cDNA library were sequenced in both 
directions through both the 5* and 3 1 ends and through all the 
regions which would identify any of the known allelic 
10 variants . 

The majority of the clones (50) isolated from the 
library from liver S33 coded for 2C9. Interestingly , all of 
the 50 clones appeared to be 1 of 2 2C9 allelic variants, 
typified by the full-length clones 65 (SEQ. ID. No. 10) and 25 

15 (SEQ. ID. No. 4) . All of these clones were sequenced through 
the 5 1 and 3 ' ends and through regions which would identify 
known allelic variants. Thirty-nine of the 2C9 clones were 
identical with clone 65 (SEQ. ID. No. 10) , and 11 were 
identical with clone 25 (SEQ. ID. No. 10) . 

20 The nucleotide sequence for clone 65 (SEQ. ID. No. 

10) and clone 25 (SEQ. ID. No. 4) is shown in Figure 2. 
Clones 25 (SEQ. ID. No. 4) and 65 (SEQ. ID. No. 10) were 
identical in the 5 f - and 3 , -noncoding regions but contained 
two single-base changes at positions 1075 and 1425. One of 

25 these base changes was conservative, but the second would 
result in one amino acid difference at position 359 
(isoleucine versus leucine) . clone 65 (SEQ. ID. N. 9) is 
identical in amino acid sequence with human form 2, although 
it differs by two silent changes in the coding region and four 

30 differences in the noncoding region (Yasumori et al. r 1987) . 
Clone 65 (SEQ. ID. No. 9) contained a leucine instead of a 
isoleucine at position 4 , a valine instead of a serine at 
position 6, and an arginine instead of a cysteine at position 
144 compared to the 2C9 sequenced by Kimura et al. (1987) . 

35 The 2C9 reported by Meehan et al. has substitutions at 

positions 144, 175, and 238 compared to the clones obtained in 
this invention (Meehan et al., Am J Hum Genet., 42:26-37 
(1988) ) . 



BNSOOCID: <WQ 9S30766A l_l_> 



WO 95/30766 



PCT/US95/05744 



.59 

The remaining clones characterized from the human 
liver S33 cDNA library encode several novel P450 2C cDNAs. 
Their DNA sequences are shown in Figure 2 and their percent 
homology with other known 2C members shown in Table II. Two 
5 of these clones, 29c (SEQ. ID. No. 6) and 6b (SEQ. ID. No. 

12), differ by one nucleotide in the coding region (position 
1154), which would result in a single amino acid change 
(threonine vs methionine at position 385) . Clone 29c (SEQ. 
ID. No. 6) had a very long (198 bp) S'-noncoding sequence and 

10 a polyadenylation signal 21 bases from the poly (A) tail. 

Clone 6b (SEQ. ID. No. 12) had an unusually long 3'-noncoding 
region containing three possible polyadenylation signals with 
no poly (A) tail. The differences in the 3 , -noncoding region 
could represent alternate splicing, allelic variants, or 

15 possibly separate genes. However, these clones are designated 
as allelic variants of (2C18) because they differ by only one 
base in the coding region. They are most similar to 2C9 (82% 
amino acid homology) and 2C19 (SEQ. ID. No. 2) (81% amino acid 
homology) (Table II) . 

20 A third unique P450 2C cDNA, clone 11a (SEQ. ID. 

No. 2) (designated 2C19) , was also identified. 2C19 is 92% 
homologous in its amino acid sequence to 2C9, 81% homologous 
to 2C18, and 79% homologous to 2C8 (SEQ. ID. No. 8). Clone 
11a (SEQ. ID. No. 2) had a short 5 1 -leader sequence and 

25 contained the stop codon, but did not have a polyadenylation 
signal or poly (A) tail. Interestingly, no clones for 2C10 
(MP-8) were isolated from either library, despite the 
sequencing of the 3* region of all 50 putative 2C9 clones. 
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TABLE II 

Percent Homology for Nucleotide 
and Amino Acid Sequences of P450 2C cDNAs* 

5 

29c lla 
Clone 2C8 2C9 (SEQ ID N0.6)(SEQ ID NO. 2) 
(SEP ID NO. 8) ( 2C18) f2C19) 

10 29c (2C18) 84 86 100 86 

(SEQ ID NO. 6) 89 93 100 93 



15 



lla (2C19) 83 94 86 100 

SEQ ID NO. 2) 91 96 93 100 



* For each comparison, the upper value represents percent 
nucleotide homology, and the lower value represents 
percent amino acid homology. The nucleic acid 
comparisons include both the coding and 3' -non-coding 
2 0 regions. The 2C9 sequence used in this comparison was 

the cDNA sequence for clone 65. 

Figure 4 shows the alignment comparisons for the 
deduced amino acid sequences of all known members of the human 

25 CYP2C family, including the three new P450s of the present 

invention. The 7 proteins, along with the consensus sequence, 
can be aligned with no gaps, and each is predicted to be 49 0 
amino acids long. The amino acid sequences show marked 
similarities with many regions of absolute conservation. 

30 Regions of marked conservation are noted form 131 to 180, and 
from 302 to 460. These human P450 2C protein sequences also 
demonstrate hypervariable regions which may be important for 
interactions between the enzyme and substrate. These include 
the region from 181-120 and 220-248 as well as 283-296 and a 

35 short region near the carboxyl terminus at 461-479. Notably, 
it has been reported that a putative recognition site for 
phosphorylation of P450 by cAMP-dependent kinase for P450 2B1 
(Arg-Arg-Phe-Ser) at positions 124-127 was conserved in 2C8 
(SEQ. ID. No. 8), 2C9, and 11 (2C19) , suggesting that these 

4 0 cytochromes might be regulated by phosphorylation (Muller et 
al., FEBS Lett. 187:21-24 (1985). 

However, 2C18 did not contain a serine at this site. 
The overall percent homology for both nucleic acid and protein 
sequences is summarized in Table II. 
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Two additional full-length allelic variants of 2C9 
have been isolated. One of these clones is identical with MP- 
4, but is full-length. It varies from the almost full-length 
human form 2 isolated by Yasumori et al., supra, by only two 
5 silent base changes in the coding region and by four changes 
in the noncoding region. The number of differences in the 
nucleic acid sequences of the presumed allelic variants 
isolated by different laboratories range from 4 to 17 and the 
amino acid changes vary from 0 to 4, as illustrated in 

10 Figure 3 . Two of the amino acid differences occur within the 
first six N-terminal residues, the others occurring singly 
throughout the sequence. The effect of these changes on 
catalytic activity has not been systematically studied. In 
Relling et al., J. Pharmacol. Exp. Ther. -252:442-447 (1990), 

15 it was reported that when the cDNAs for 2C8 (SEQ. ID. No. 8) 
and 2C9 4-hydroxylated racemic mephenytoin but did not 
metabolize (S) -mephenytoin. However, the form of isolated 2C9 
(human form 2) which is described in Yasumori et al. (1990) , 
metabolized (S) -mephenytoin preferentially when expressed in 

20 yeast. These forms differed by only three amino acids. In 

contrast, Brian et al. , Biochemistry 28:4993-4999 (1989) found 
that when a full-length MP-8 (constructed with the first 15 
nucleotides predicted from the known amino acid sequence of 
P450 inp _ 1 ) was expressed in yeast, it did not metabolize (S)- 

25 mephenytoin. This form would differ from human form 2 by only 
two amino acids. Thus, the role of 2C9 in (S)-mephenytoin 
metabolism remains controversial. 

Example 3: Human RNA Blot Analysis and Hybridization 
3 0 Conditions 

Poly(A+) RNA (10 fig) was electrophoresed in a 1% 
agarose gel under denaturing conditions and transferred to a 
Nytran filter (Micron Separation, Inc., Westboro, MA), and 
filters were then baked for 2 h at 80 °C. The filters were 
35 prehybridized for 2 h f then hybridized overnight with a 32 p- 
labeled specific oligonucleotide probe for 2C8 (SEQ. ID. 
No. 8) (T300R) at 42 °C, washed 3x5 min at room temperature 
and 1x5 min at 42° C with 2 x SSC/0.1% SDS, and 
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radioautographed. Filters were then stripped with 5 mM Tris 
(pH 8,0), 0.2 mM EDTA, 0*05% sodium pyrophosphate, and 0.1 x 
Denhardt's for 2 h at 65° C and rehybridized with a random- 
primed actin cDNA (Oncor, Gaithersburg, MD) at 50° C using 6 x 
5 SSC, 4 x Denhardts, and 0.5% SDS. These filters were washed 1 
x 5 min at room temperature, 1 x 10 min at 48° C, and 4 x 15 
min at 48° C and radioautographed as before. The 2C8 mRNA 
band was quantitated by scanning with an UKB Ultrascan laser 
densitometer, and the values of the integrated peaks were 

10 divided by those of the actin peaks. 

Hybridization with T300R was negligible in mRNA from 
860624 compared to S33 and a number of other liver samples 
(Figure 5) . When corrected for hybridization with the actin 
probe, the amounts of 2C8 (SEQ. ID. No. 8) mRNA were 

15 consistent with the relative amounts of HLx observed in 

Western blot analysis. Laser scans of the autoradiographs 
indicated that 2C8 (SEQ. ID. No. 8) mRNA levels in sample 
860624 were at least 70-fold lower than in S33 and 3 to 15- 
fold lower than in any of the remaining samples. 

20 

Example 4 : Cell Expression Studies 

cDNA inserts were ligated into the cloning region of 
the expression plasmids pSVL (Pharmacia LKB biotechnology, 
Inc. , Piscataway, NJ) or pcD (Okayama et al w Mol. Cell. Biol. 

25 3:280-289 (1983)) and used to transform COS-1 cells. COS-1 

cells were placed at (1-2) x 10 6 cells per 1-cm dish and grown 
for 24 h in Dulbecco 1 s-modif ied Eagle 1 s medium with 10% fetal 
bovine serum (DMEM) . The cells were then washed with 
Dulbecco 1 s phosphate-buffered saline (PBS) and transfected 

30 with recombinant plasmid (3 fig per dish) in DEAE-dextran (500 
Hg/mL) for 30 min-1 h at 37° C. The transfected cells were 
then treated with chloroquine (52 /ig/mL) in DMEM for 5 h 
(Luthman et al. , Nucleic Acids Res. 11:1295-1308 (1983)), 
washed with PBS, refed with DMEM, and incubated for 72 h prior 

35 to harvest. Typically, 15-2 0 dishes were transfected with 
each recombinant plasmid. For Western blot analysis of the 
recombinant transformed COS-1 cells, cells were scraped from 
the dishes into buffer (50 mM Tris-HCl, pH 7.5, 150mM KC1, and 
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ImM EDTA) and lysed with 3 x 5 s bursts with a polytron. A 
portion of each lysate was centrifuged at 9000g and then 
lOOOOg for the preparation of a microsomal fraction. Western 
blots were then performed as described above. Total RNA was 
5 isolated from transfected COS-1 cells, and Northern blots were 
performed as described for human samples. The filters were 
hybridized with a 32 P-labeled oligonucleotide probe which 
hybridizes with all 2C clones isolated (2C500R) (5*- 
GGAGCACAGCCCAGGATGAA-3 • ) (SEQ. ID. No. 20) at 55 °C, and 

1 0 r adioautogr aphed . 

The two variant cDNAs for 2C9 , the two variant cDNAs 
for 2C18, and the cDNA for 2C19 were inserted into expression 
vectors and transfected into COS-1 cells. Cell lysates were 
prepared and immunoblotted by using antibody to HLx and P450 

15 2C9. The results are shown in Figure 4. Transfection of COS- 
1 cells with the two variants of 2C9 (25 (SEQ. ID. No. 4) and 
65 (SEQ. ID. No. 10)) resulted in the expression of a protein 
(SEQ. ID. No. 3) with a molecular weight equal to that of pure 
2C9. In contrast, neither 2C18 (either variant) nor 2C19 was 

20 detected by antibody to HLx or 2C9. However, Northern blot 

analysis indicated that all three cDNAs had been successfully 
transfected into these cells. The sizes of the transcripts 
were those expected for the constructs. The somewhat lesser 
hybridization of the 2C oligoprobe with RNA from cells 

25 transfected with lia (SEQ* ID- No. 2) reflects a lower amount 
of RNA in this sample as shown by the hybridization with the 
actin probe. 

Example 5: Expression of Cytochrome P450 2C19 and 2C18 
30 Polypeptides in a Stable Cell Line 
1. Materials 

fa) Liver Samples and Chemicals 
Human liver samples were obtained from Dr. Fred 
Guengerich, University of Vanderbilt, Nashville, TN % 
35 Restriction endonucleases were purchased from Stratagene 

Cloning Systems (La Jolla, CA) . [cr- 32 P]dCTP (3000 Ci/mmol) , 
[r 32 P]ATP (5000 Ci/mmol) and [a- 35 S]dATP (650 Ci/mmol) were 
from Amersham Corp. (Arlington Heights, IL) . Nirvanol was 
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obtained from Adrian Kiipfer, University of Berne, Switzerland 
and separated into its R— and S- enantiomers as described by 
Sobotka et al., J. Amer. Chem. Soc. 54:4697-4702 (1932). 
Radiolabelled S- and R-mephenytoin (N-methyl- 14 C) were 
synthesized by E.I. DuPont de Nemours & Co., Inc. (Wilmington, 
DE) by methylation of R- and S-nirvanol. The radiochemical 
purity of both isomers was greater than 90% as assessed by 
HPLC. A single impurity which accounted for less than 2% of 
the parent compound was not characterized, since it eluted 
after the metabolites and parent compound. Moreover, the 
percentage of the impurity remained the same (less than 2%) 
before and after incubations. All sequencing was done by the 
dideoxymethod using Sequenase Kits (U.S. Biochemical Corp., 
Cleveland, OH) . The specific activities of the S- and R- 
enantiomers were 20.7 and 20.9 mCi/mmol respectively. All 
other reagents used are listed below or were of the highest 
quality available. 

fb) Additional Sequences of 2C cDNAs Used in the 
Expression Studies 

Two full-length clones of 2C8 (7b and 7c) described 
in Romkes et al. , Biochemistry 30:3247-3255 (1991), were 
sequenced through the coding region in the present study. The 
sequences were similar to that of the 2C8(HP1-1) reported by 
Okino et al., supra; however, both clones had coding changes 
at position 390 (A-*C) (Asn^ 30 -*Thr) and G-*C at position 792 
(Met 264 -»Ile) and a change in the noncoding region at 
1497 (T-*C) . These changes presumably represent a second 
allelic variant of 2C8. The Thr 130 and ile?£ 4 amino acids 
found in our 2C8 clones are conserved in the remainder of the 
human P450 2C subfamily (2C9, 2C18, and 2C19) and are 
therefore consistent with the amino acid substitutions in 
other members of this subfamily. 

fc) Yeast Strains and Media 

Saccharomyces cerevisiae 334 (MAT of, pep 403, prbl- 
1122, ura 3-52, leu 2-3, 112, regl-501,gall) , a protease 
deficient strain kindly provided by Dr. Ed Perkins (NIEHS) , 
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was used as the recipient strain in these studies and 
propagated non-selectively in YPD medium (1% yeast extract, 2% 
peptone, 2% dextrose) (Hovland et al., Gene 83:57-64 (1989)). 
For the selection of Leu + transf ormants , the cells were grown 
5 in synthetic complete medium minus leucine (Rose et al., 

Methods in Yeast Genetics (Rose et al., eds . ) pp. 180-187, 
C.S.H.P., NY 1990). Plates were made by the addition of 2% 
agar. 

10 2 . Methods 

fa) Amplification of 2C18 and 2C9 RNA for Direct 

Sequencing 

Total RNA from selected human liver samples was 
isolated by the single-step method (Chomozynski et al. , Anal. 

15 Biochem. 163:156-159 (1987), using TRIREAGENT m (Mol. Res. 

Center, Inc., OH). RNA (10 /xg) was reverse transcribed using 
2.6 /xM random hexamers as the 3 • -primer by incubating for 
1 hour at 42 °C using 2.5 U//xl of M-MLV reverse transcriptase 
(BRL, Grand Island, NY) in 10 mM Tris-HCl, pH 8.3, 5 mM KCl, 

2 0 5mM MgCl 2 , 1 U//il RNase inhibitor (Promega, Madison, WI) and 
1 mM each of dATP, dCTP, dGTP, and dTTP (Perkin Elmer Cetus, 
Norwalk, CT) . The samples were then heated for 5 minutes at 
99 °C to terminate the reverse transcription. 

The cDNA was then amplified for a region containing 

25 the allelic differences in 2C18 and 2C9 using a nested PCR 
method. The DNA was amplified in IX PCR buffer (50 mM KCl, 
10 mM Tris-HCl, pH 8.3) containing 1 mM MgCl 2/ 0.2 mM each of 
dATP, dCTP, dGTP, dTTP and 20 pmol of each of the 5» and 3' 
primers in a final reaction volume of 100 /zl. The reaction 

30 mixture was heated at 94 °C for 5 minutes before addition of 
2.5 U of AmpliTaq DNA polymerase (Perkin Elmer Cetus) . For 
PCR of 2C18, the 3 1 -primer was 5 • -TGGCCCTGATAAGGGAGAAT-3 • 
(SEQ. ID. No. 23) and the 5' -primers were 

5 ' — ATCCAGAG ATACATTGACCTC - 3 • ( SEQ . ID . No . 24) ( outer ) and 
35 5 • — CCATGAAGTGACCTGTGATG— 3 1 (SEQ . ID . No . 25) ( inner ) . For 
2C9, the 3* -primer was 5 1 -AAAGATGGATAATGCCCCAG— 3 * (SEQ. ID. 
No. 26) and the 5 f -primers were 5 1 — GAAGGAGATCCGGCGTTTCT— 
3' (SEQ. ID. No. 27) (outer) and 5 1 -GGCGTTTCTCCCTCATGACG— 
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3'(SEQ. ID. No. 28) (inner). The outer amplification was 
performed for 20 cycles consisting of denaturation at 94 °C for 
1 minute, annealing at the appropriate temperature for 
30 seconds, and extension at 72 °C for 1 min. After a 50-fold 
5 dilution, PCR was carried out similarly with the inner primers 
for 35 additional cycles. 

The PCR products were purified using a Centricon-30, 
dried, suspended in 40 fil of sterile water, and sequenced 
using Sequenase Kits and a P 33 -end labeled sequencing primer. 
10 For 2C18, the primer used was 2C18.1184R 5 1 -TTGTCATTGTGCAG-3 1 
(SEQ. ID. No. 29). Sequencing primers for 2C9 were 2C9.1030F 
5 1 -CACATGCCCTACACA-3 • (SEQ. ID. No. 30), 2C9.385F 
5 ' -TGACGCTGCGGAATT-3 1 (SEQ. ID. No. 31), and 2C9.783F 
5 ' -GGACTTTATTGATTG-3 (SEQ. ID. No. 32). 
15 Full length 2C9 cDNA was also amplified by PCR from 

a human liver with high S-mephenytoin 4 1 -hydroxylase activity 
using the primers 5 1 -ATGATTCTCTTGTGGTCCT-3 1 (SEQ. ID. NO. 33) 
and 5 • -AAAGATGGATAATGCCCCCAG-3 ■ (SEQ. ID. No. 34). The PCR 
reaction was similar to above, except that the primer 
20 concentrations were increased 10-fold (0.25 /iM) . The PCR 

products were then cloned into the pCRlOOO vector using the TA 
Cloning System (In Vitrogen, San Diego, CA) and sequenced to 
identify the allelic variant present. 

lb) Plasmid Construction and Methods for Amplifying 
Full-length 2C18 and 2C19 cDNAs by PCR 

The strategy for cloning the P450 2C cDNAs into the 
yeast vector pAAH5 is described below. The 5 f -noncoding 
sequence of the P450 2C cDNAs was eliminated -by PCR 
amplification to optimize expression in yeast cells. The 5 1 - 
primer introduced a Hind III cloning site and a six A-residue 
consensus sequence upstream of the ATG codon to promote 
efficient translation in yeast (Hamilton et al., -Nucl. Acids 
Res. 15:3581-3593 (1987), Cullin et al. , Gene 65:203-217 
(1988)). The 3'- primer was positioned between the stop codon 
and polyadenylation site and introduced a second Hind III 
site. cDNA inserts in the pBluescript vector (0.1 jig) (Romkes 
et al., (1991), supra) were amplified by PCR as described 
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before except that the reaction contained 3.5 mM MgCl 2f 
0.25 /zM each of the 5*- and 3'- primers, and 1 fil PerfectMatch 
(Stratagene, La Jolla, CA) . Amplification was performed in 
sequential cycles, with the first cycle including denaturation 
5 for 1 min. at 94 °C, annealing at the appropriate temperature 
for 1 min., and polymerization at 72 °C for 3 min. The 
remaining 24 cycles consisted of a denaturation step at 94 °C 
for 1 min. and a combined annealing/extension step at 72 °C for 
3 min. After the last cycle, all samples were incubated an 

10 additional 10 min. at 72 °C. The primers used were: 

2C8 : 5 » -GCAAGCTTAAAAAAATGGAACCTTTTGTGGTCCT-3 • ( SEQ . ID . 
No. 35) and 5 1 -GCAAGCTTGCCAGATGGGCTAGCATTCT-3 ' (SEQ. ID. 
No . 3 6); 2C9 : 5 1 -GCAAGCTTAAAAAAATGGATTCTCTTGTGGTCCT-3 ' ( SEQ . 
ID. No. 37) and 5 1 -GCAAGCTTGCCAGGCCATCTGCTCTTCT-3 • (SEQ. ID. 

15 No . 3 8 ) ; 2C19 : 5 • -GCAAGCTTAAAAAAATGGATTCTCTTGTGGTCCT-3 f ( SEQ . 
ID. No. 39) and 5 • -GCAAGCTTGCCAGACCATCTGTGCTTCT-3 ' (SEQ. ID. 
No. 40) . 

The PCR products were cloned into the pCRlOOO vector 
(InVitrogen, San Diego, CA) . Recombinant plasmids were 

20 isolated from E. coll ( INVofF ' ) cells using Qiagen plasmid 
purification kits, and the PCR products were completely 
sequenced as described above to verify the fidelity of the PCR 
reaction. A mutation of ASP 2 -»Val was initially introduced 
inadvertently in 29c via the primers utilized due to an error 

25 in the original sequencing at this position. Therefore, the 
correct 2C18-Asp 2 cDNAs were cloned into the pAAH5 vector by 
an alternate strategy. The 3*-end was cut with Ndel, blunted, 
and ligated to a Smal/Hindlll adapter. The clone was then 
partially digested with BamHI which cuts after the initiation 

30 ATG as well as internally, and the intact 1700 fragment get 

purified. A BamHI/Hindlll linker was prepared from the oligos 
5 1 -AGCTTAAAAAAATG-3 1 (SEQ. ID. No. 41) (upper) and 
5 1 — GATCCATTTTTTTA-3 * (SEQ. ID. No. 42) (lower), annealed, and 
ligated to the cDNA fragment to introduce a Hindlll cloning 

35 site and regenerate the ATG codon. 

The PCR amplified cDNAs were isolated by Hind III 
digestion, ligated into the pAAH5 yeast expression vector, and 
the proper orientation confirmed by restriction analysis and 
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sequencing. The expression vector pAAHS, which contains the 
yeast ADH1 promoter and terminator regions and the Leu2 
selectable marker, was kindly provided by Dr. M. Negishi 
(NIEHS) . The recombinant plasmids were isolated from E. coli 
5 Dh5a cells using Qiagen plasmid purifications kits and 

transformed into yeast as described previously (Faletto et 
al., J*. Biol. Chem. 267:2032-2037 (1992), using the lithium 
acetate method of Ito et al., J. Bacterid. 153:163-168 
(1983) . 

10 

(c) Immunoblots and Cytochrome P450 Determinations 
Yeast microsomes or whole cell lysates were prepared 
from transformed cells isolated at mid-logarithmic phase as 
described previously (Oeda et al. , supra) with slight 

15 modifications (Faletto et al., supra) and stored at -80 °C in 
0.1 M phosphate (pH 7.4) containing 20% glycerol and 0.1 mM 
EDTA. Protein concentrations were determined by the method of 
Bradford et al. , Anal. Biochem. 72:248-254 (1976). SDS- 
polyacrylamide gel electrophoresis and western blots were 

20 performed on yeast microsomes or whole cell lysates (Faletto 
et al., supra) and immunoblots probed with antibody to the 
appropriate P450 as described (Yeowell et al. , Arch. Blochem. 
Blophys. 243:408-419 (1985). Cytochromes P450 2C8, P450 2C9 
and NADPH:P450 reductase were purified from human liver 

25 microsomes (Raucy et al., Methods In Enzymol. 208:577-587 

(1991) and antibodies to 2C8 and 2C9 prepared in rabbits as 
previously described (Leo et al. , Arch. Biochem. Biohys. 
269:305-312 (1988)). Specific peptides NH 2 ~CIDYIiPGSHNKIAENFA- 
COOH (SEQ. ID. No. 43) (amino acids 231-249) for P450 2C18 and 

30 NH 2 -CLAFMESDILEKVK-COOH (SEQ. ID. No. 44) (amino acids 236- 
249) for 2C19 were selected from amino regions where these 
P450s vary from other known 2C subfamily members (Romkes et 
al. , (1991), supra) . These peptides were synthesized, 
conjugated to bovine serum albumin via jn-maleimidobenzoyl-N- 

35 hydroxysuccinimide ester, and antibodies to the conjugates 

raised in rabbits by BIOSYNTHESIS INC. (Denton, TX) . E. colx 
lysate (4 mg/ml) was added to the primary peptide antibody in 
first step of the immunoblot procedure to block non-specific 

BNSOOCID: <WO 9530766A 1_l_> 



WO95/30766 



PCT/US95/05744 



69 

reactions of these rabbit antibodies to yeast cell wall 
proteins. Cytochrome P4 50 concentrations of microsomes were 
determined by dithionite-reduced carbon monoxide difference 
spectra by the method of Omura et al-, J. Biol. Chem, 
5 239:2370-2378 (1964) using an extinction coefficient of 91 mM- 
lcm" 1 - 

Microsomes of human livers were prepared as 
described by Raucy et al., supra. SDS-polyacrylamide gel 
electrophoresis and immunoblot analysis was performed as above 
10 except that immunoblots were developed using the ECL (enhanced 
chemiltrminescence) Western blotting kit from Amersham (UK) . 
Immunoblots were scanned with a laser densitometer (LKB 
Instruments) . 

15 (d) Purification of Cytochromes from Recombinant 

Yeast Microsomes 

Recombinant yeast microsomes were prepared from a 
10-12 1 culture, and recombinant P450s were purified by 
aminooctylsepharose chromatography as described by Ivasaki et 

20 al*, *7. Biol. Chem. 226:3380-3382 (1991). The Emulgen was 

then removed from protein by adsorption of the protein to a 4g 
hydroxy lapatite column (Hypatite C, Clarkson Chemical Company, 
Williamsport, PA) equilibrated with 10 mM potassium phosphate 
buffer (pH 7.2), 20% glycerol, 0.1 mM EDTA, and 0.1 mM DTT and 

25 washing the column with the same buffer until the absorbance 
at 280 run returned to zero. The P450 was then eluted with 
4090 mM DTT, and dialyzed overnight against 100 mM potassium 
phosphate buffer (pH 7.4, 20% glycerol and 0.1 mM EDTA. 
Absolute and CO difference spectra of pur if I fed P450s were 

30 determined in the same buffer but containing 0.2% Emulgen and 
0.5% cholate. 

(el Tolbutamide Hydroxylase Assays 
Tolbutamide hydroxylase activity was measured 
35 according to Knodell et al., J. Pharmacol. Exper. Ther. 

241:1112-1119 (1987), with several modifications. Yeast 
microsomes (l mg protein) were preincubated with 3 00 pmol 
hamster P450 reductase in 0.2 ml of the incubation buffer 
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(below) for 3 min at 37 °C. The reaction was then placed on 
ice and incubated in 0.2 ml of 50 mM HEPES buffer (pH 7.4) 
containing 1.5 mM MgCl 2 , o.l mM EDTA in a final volume of 1 ml 
and 1 mM sodium tolbutamide. The reaction was initiated with 
5 0.5 mM NADPH. Human liver microsomes (0*22 mg protein) were 
incubated without reductase. Incubations with reconstituted 
recombinant P450s contained 50 pmol purified P450 enzyme, 
150 pmol P450 reductase, and 15 /xg dilauroylphosphatidyl- 
choline, and were performed in 100 mM potassium phosphate 

10 buffer (pH 7.4). Reactions were terminated after 60 min at 
37 °C by the addition of 50 pi of 4N HC1, followed by 
extraction with 3 ml of water-saturated ethyl acetate. The 
ethyl acetate extracts were dried under nitrogen at 4 0 °C, the 
residue resolubilized in 200 /xl methanol, and 

15 4 -hydroxy tolbutamide then assayed using HPLC by injecting 

50 }il of the solubilized extract onto a /iBONDAPAK C 18 column 
(4.6x3 00 mm) using 0.05% phosphoric acid, pH 2,6: acetonitrile 
(6:4, v/v) as the mobile phase with a flow rate of 1 ml/min. 
The column eluate was monitored at 230 nm and rates of product 

20 formation were determined from standard curves prepared by 

adding varying amounts of 4-hydroxytolbutamide to incubations 
conducted without NADPH. Preliminary experiments confirmed 
that 4-hydroxytolbutamide formation by human liver microsomes 
(30-120 pmol P450) was linear for up to 90 min. Samples were 

25 analyzed in triplicate. 

(f ) Mephenvtoin 4 1 -Hydroxylase Assay 
Mephenytoin 4 '-hydroxylase activity was measured by 
a modification of the radiometric HPLC assay described by 
30 Shimada et al., J. Biol. Chem. 261:909-921 (1986), as 

described below. Purified or recombinant yeast microsomes 
(10-50 pmol) were preincubated with 

dilauroylphosphatidylcholine (15 /ig per 50 pmol P450) , P450 
reductase (500 U per 50 mol P450) , and human cytochrome b 5 
35 (2:1 molar ratio when added). The reconstituted mixture was 
preincubated for 5 min at 37 °C, and then placed on ice. A 
final concentration of 0.4 mM radiolabelled S- or R- 
mephenytoin (20.7 mCi/mM and 20.9 mCi/mMol) was added to 50 mM 
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HEPES buffer (pH 7.4) containing 0.1 mM EDTA and 1-5 mM MgCl 2 
for recombinant 2C proteins. The mixture was then incubated 
at 37° with shaking for 3 min, and the reaction started with 
the addition of 2mM NADPH and terminated after 3 0 min with an 
5 equal volume of methanol. Cytochrome b 5 was not included in 
all CYP2C18 reactions, since it had no effect or produced a 
slight inhibition on the activity of this CYP protein. 
Reaction volumes were generally 0.25 ml except when the volume 
of recombinant purified cytochrome or yeast microsomes was 

10 greater than 50 /zl. In these cases, the volume was increased 
to 0.5 ml to limit the volume of glycerol from the purified 
preparation to <4% of the final volume. Incubations with 
human microsomes did not contain exogenous P4 50 reductase or 
cytochrome b 5 , and they were carried out in 0.1 M phosphate 

15 buffer (pH 7.4) instead of HEPES buffer. Initial experiments 
shows that S-mephenytoin hydroxylase activity of human liver 
microsomes was linear for at least 60 minutes and from 0.05 
through 0.2 mg microsomal protein, and that of the R- 
enantiomer was linear through 1 mg microsomal protein. 

20 At the end of the incubation period, the reactions 

were terminated with an equal volume of methanol. The 
incubation mixture was centrifuged at l0,000g for 10 min and 
an aliquot assayed directly using HPLC without extraction. 
Samples with particularly low activity were concentrated by 

25 lyophilization and redissolved in a small volume of 

methanol: water (1:1) before assay. The HPLC system consisted 
of a reverse phase C18 (lOjnn) Versapak, 3 00 mm x 4.1 mm column 
(Altech Associates, Deer field, IL) using an isocratic solvent 
consisting of methanol: water (45:55) with a flow rate was kept 

3 0 of 1 ml/min for 25 min. Detection of radioactive peaks was 

accomplished using an on-line Flow-One radiochemical detector 
(Radiomatic Instruments Co., Tampa, FL. Detection of the 
unlabeled 4 1 -hydroxymephenytoin authentic standard was 
performed using an on-line multiwavelength UV detector at both 

35 211 and 230 nm. 
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1 (Q) Statistical analyses 

Tolbutamide hydroxylase and mephenytoin hydroxylase 
activities of microsomes prepared from different recombinant 
yeasts were compared by analysis of variance and by Fisher's 
5 least significant difference test (Carmer et al., Am. Stat. 
ASS. 68:66-74 (1973)). 

3 . Results 

(a) Expression of P450 2C cDNAs in yeast 

10 Western blot analysis confirmed the expression of 

the recombinant human CYP2C proteins in the recombinant yeast 
(Fig. 6). Antibodies to 2C8 and 2C9 recognized polypeptide 
bands of approximately 50,000 daltons (2C8) and 55,000 daltons 
(2C9) which corresponded in mobility to those of the 

15 recombinant proteins purified from yeast microsomes. These 

mobilities corresponded to those of the corresponding 2C8 and 
2C9 proteins purified from human liver. 2C19 was recognized 
by antibodies to both the 2C9 and the 2C19 peptides. This 
protein corresponded in mobility (<50,000 daltons) to the 

2 0 lowest of three bands in Western blots of human liver 

microsomes probed with antibody to human 2C9. The mobility of 
2C18 was intermediate between that of 2C8 and 2C19. 
Antibodies to 2C18 and 2C19 peptides were specific for their 
antigen; however, antibody to 2C9 cross-reacted strongly with 

25 2C19 and weakly with 2C8 and 2C18. 

CO difference spectral analysis indicated that the 
recombinant P450 2C proteins were expressed at levels as high 
as 160-250 pmol/mg protein in some yeast microsomal 
preparations. 2C18, 65 (2C9), and 25 (2C9) were expressed at 

30 levels of 20 to 60 pmol/mg microsomal protein. Initially, 11a 
(2C19) was expressed extremely poorly, and the CO difference 
spectrum of the recombinant 2C19 yeast was indistinguishable 
from that of control yeast (<7 pmol/mg protein) . However, 
after repeated transf ections and selection, expression of 2C19 

35 at _17 pmol/mg protein was achieved. All of the CYP2C 

proteins were low spin hemoproteins. CYP2C18 appeared to be 
somewhat unstable in yeast microsomes with a large proportion 
(-1/3 to 1/2) of the P450 being converted to P420 in the 
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presence of dithionite and carbon monoxide. None of the other 
recombinant CYP2C proteins showed this lack of stability. 

(h) Optimization of Tolbutamide and S-Mephenytoin 
5 Hydroxylase Assays 

Preliminary studies indicated that exogenous P4 50 
reductase (500 U/50 pmol P450) stimulated metabolism of 
tolbutamide by recombinant 2C9 in yeast microsomes > 10-fold 
and stimulated S-mephenytoin hydroxylase activity 

10 approximately 2-fold. Activity of the recombinant 2C proteins 
was linear with amount of P450 for 30 minutes through at least 
20 pmol P450 for 2C19 (Fig. 7) and 50 pmol for the other CYP2C 
forms. Cytochrome b 5 stimulated S-mephenytoin hydroxylase 
activity of both 2C9 and 2C19 in yeast microsomes and the 

15 optimal ratio of b 5 to P450 was approximately 2:1, but it 
generally had no effect or produced a slight inhibition of 
mephenytoin hydroxylase activity of 2C18 (Fig. 8) . This 
difference is consistent with the fact that all of the CYP2C 
proteins except 2C18 contain a Ser at position 12 8 which is a 

2 0 recognition site for cAMP protein kinase 

( 125 Arg-Arg-Phe-Ser 128 ) (Miiller et al., FEBS Lett. 187:21-24 
(1985) , and this sequence is also thought to be part of a b 5 
binding site (Jansson et al., Arch^ Biochem. Biophys . 259:441- 
448 (1987) ; 2C18 contains Cys at position 125. 
25 Mephenytoin 4 • -hydroxylase activity of recombinant 

yeast microsomes was consistently higher in HEPES than 
phosphate buffer, while activity of human liver microsomes was 
-2-fold higher in phosphate buffer (pH 7.4). Therefore, 
recombinant proteins were subsequently assayed in HEPES buffer 

3 0 with exogenous reductase and cytochrome b 5 except for 2C18 

which was tested both with and without cytochrome b 5 . Human 
liver microsomal activities were assayed in phosphate buffer. 
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fc) Mephenvtoin hydroxylase activity of recombinant: 
human 2C proteins 

S-mephenytoin 4 1 -hydroxylase activities of yeast 
microsomes containing recombinant human CYP2C proteins were 
5 compared under optimized conditions described above. HPCL 

profiles of the metabolites of S-mephenytoin produced by human 
liver microsomes and recombinant human CYP2C proteins are 
shown in Fig, 9 and the results summarized in Table III. 
Recombinant 2C19 4 1 -hydroxylated S-mephenytoin at a rate of 

10 -5 nmol/min/nmol P4 50 which was an order of one magnitude 
higher than the rate of 4 • -hydroxylation in human liver 
microsomes (Table III and Fig. 9) . The retention time (5- 
6 min) of the 4 ' -hydroxyraephenytoin metabolite was identical 
to that of the authentic unlabeled standard. 2C19 also 

15 produced small quantities of two unknown metabolites eluted at 
3-4 and 7-8 min. These unknown metabolites were also produced 
by liver microsomes, and the metabolite with the shorter 
retention time was the principal metabolite produced by 2C8 . 
Parent S-mephenytoin eluted at 14-15 min. followed by the 

20 unknown impurity which eluted at 16-17 min. Similar retention 
times were observed for R-mephenytoin and its metabolites. 

The rate of 4 1 -hydroxymephenytoin formation by 2C19 
was at least 100-fold higher than that of 2C9 (both alleles) , 
2C18 (both alleles) and 2C8 (Table III). The rate of 4"- 

25 hydroxylation of S-mephenytoin by 2C8 appeared to be lower 

than that of 2C9 (0.02 nmol/min/nmol). The 4 1 -hydroxy 1 at ion 
of mephenytoin by 2C19 was stereospecif ic; the rate of S- 
hydroxylation was at least 3 0-fold higher than that of R— 
hydroxylation (Table III). In contrast, the 4 • -hydroxylation 

30 of mephenytoin by the other human CYP2C proteins did not 
appear to be stereospecif ic. 
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TABLE III 

S-Mephenytoin 4 ' -Hydroxylase" Activities in 
Recombinant Human CYP2C Yeast Microsomes 



Mephenytoin 4 ' -Hydroxylase Activity 
nmol/min/nmol P450 



Microsomes 



R/S Ratio 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



Controls 


0 


.028 


± 


0 


.001 


0 


.024 


± 


0 


.003 


2C9-Ile 359 (65) 


0 


.043 


± 


0 


.000 


0 


.041 


± 


0 


.005 


2C9-Leu 359 (25) 


0 


.031 


± 


0 


.009 


0 


.040 


± 


0 


.01 


2C8 


0 


.037 


± 


0 


.001 


0 


.016 


± 


0 


.001 


2C18-Thr 385 (29c) + b5 


0 


.042 


± 


0 


.004 


0 


.054 


± 


0 


.003 a 


2C18-Thr 385 (29c), no b5 


0 


.034 


± 


0 


.008 












2C18-Met 385 (6b) 


0 


.023 


± 


0 


.004 


• 0 


.019 


± 


0 


.005 


2C19 (lla) 


4 


.6 


± 


0 


3a, b, d 


0 


.014 


± 


0 


.02 a 


Human liver microsomes HB1 6 


0 


.283 


± 


0 


.037 a ' c ' d 


0 


.117 


± 


0 


.017 a ' c 



0.9 
0.9 
1.3 
0.4 
1.3 

0.9 

0.03 

0.4 



S-Mephenytoin hydroxylase assayed as described in Methods. Reaction 
mixtures contained 10 pmol of recombinant CYP2C19 or 50 pmol o£ other 
recombinant CYP2C yeast microsomes, 500 U of purified P450 reductase and 15 
fig phospholipid per 50 pmol of P450, and 0.4 mM radioactive substrate in 
0.1 M HEPES buffer (pH 7.4) . Unless otherwise stated recombinant yeast 
microsomes were also reconstituted with a 2:1 molar ratio of cytochrome b 5 . 
Reactions were incubated at 37°C for 30 min with 1 mM NAD PH. Control 
reactions contained the same reaction mixture and were incubated similarly 
with an equivalent amount of control yeast microsomal protein (1 mg) . 
Specific content of P450 of the recombinant yeast microsomes ranged from 
35-48 pmol/mg except for 2C8 (191 pmol/mg) and 2C19 (17 pmol/mg) . Control 
liver reactions contained 0.1 mg microsomal protein but were not fortified 
with reductase, cytochrome b 5 , or phospholipid and were incubated with 0.1 
M phosphate buffer (pH 7.4) . Values represent the means ± SE. 



a Activity significantly higher than that of control yeast microsomes, P < 
0.05. Analysis of variance and Fisher's Least Significant difference test. 

b 2C19 activity significantly higher than activities of all other 
recombinant CYP2C proteins or human liver microsomes, P < 0.05. 

c Human liver microsomes significantly higher than recombinant microsomes 
except 2C19, P < 0.05. 

d Significant difference between S- and R-Mephenytoin hydroxylase 
activities, P < 0.05. 
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Recombinant CYP2C proteins were purified from yeast 
microsomes and their ability to 4 1 -hydroxy late the s- and R- 
enantiomers of mephenytoin were also examined in a 
reconstituted system (Table IV) . 2C19 had similar turnover 
5 numbers for S-mephenytoin 4 ♦ -hydroxylation in the 

reconstituted system and in recombinant yeast microsomes 
fortified with reductase- This turnover number was at least 
10-times higher than that of human liver microsomes, and it 
was 50-100 times higher than that of recombinant 2C9, 2C18 or 

10 2C8. The turnover number of recombinant 2C9 was -100 times 

higher than the activity of a preparation of 2C9 purified from 
human liver. 4 1 -hydroxylation of mephenytoin by 2C19 was 
stereospecif ic for the S-enantiomer , while metabolism by 2C9 
was not stereospecif ic . Surprisingly, 2C18 appeared to be 

15 stereoselective for the R-enantiomer of mephenytoin. The 

turnover number of 2CI9 for S-mephenytoin 4 1 -hydroxylase was 
also -30 times higher -ban the turnover numbers reported for a 
preparation P450 MP purified from human liver by Srivastava et 
al. f Mol. Pharmacol. 40:69-79 (1991) (0.21 nmol/min/nmol 

20 P450) . 

Although 2C9 exhibits poor catalytic activity toward 
S-mephenytoin, this cytochrome appears to be the principal 
tolbutamide hydroxylase (Table IV and V) . The turnover 
numbers for hydroxylation of tolbutamide by the purified 

25 recombinant 2C9 were somewhat lower than those of 2C9 purified 
form human liver in the absence of exogenous reductase. The 
lie 359 allele of 2C9 had a 3-fold higher turnover number for 
tolbutamide than the Leu 359 allele when activity of the 
recombinant microsomes were adjusted for P450 content 

30 (Table V) . 2C19 also appeared to metabolize tolbutamide at a 
rate comparable to that of 2C9, although this rate was 
difficult to estimate due to the low specific content of P450 
in the recombinant 2C19 yeast clone available at the time of 
these assays. The two alleles of 2C18 exhibited lower 

35 tolbutamide hydroxylase activity than 2C9 in recombinant yeast 
microsomes . 
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TABLE V 

Tolbutamide Hydroxylase Activities of 
Recombinant Human CYP2C Yeast Microsomes 



Microsomes 



P4 50 Content 
(pmol/mg) 



Tolbutamide Hydroxylase Activity 
(nmol/min/mg protein) Inmol/min/nmol P450) 



10 



15 



Control Yeast 


<5 


0.3 


± 


0.01 






2C9-Ile 359 (65) 


55 


169 .8 


± 


7 4 a ' b 


3.4 ± 


0.15 


2C9-Leu 359 (25) 


20 


14 .8 


± 


0.3 a ' c 


0.99 ± 


0.02 


2C8 


80 


8.5 


± 


0.2 a 


0.11 ± 


0.003 


2Cl8-Asp 2 Thr 385 (29c- la) 


53 


9.3 


± 


0.7 a 


0.19 ± 


0.02 


2C18-Asp 2 Met 385 (6b-9) 


34 


11 . 1 


± 


1.2 a 


0.37 ± 


0.04 


2C19 (lla-3) 


<7 


18.4 


± 


2.4 a ' d 


ND 




UC89 36 Human Liver 
Microsomes 


227 


116 


± 


0.8 a 


2.3 ± 


0.02 



Tolbutamide hydroxylase activities measured as described in methods. 
Reaction mixtures contained 1 mg yeast microsomal protein or 0.2 mg UC8936 
20 human liver microsomal protein (50 pmol P4 50) . Purified P450 reductase 

(1,000 units) was included in reactions with yeast microsomes but not human 
microsomes. Values were the means ± SE. ND=Not calculated due to low 
specific content of 2C19 in yeast in this experiment. 



25 a Significantly higher than control yeast microsomes, P<0.05. Pairwise 
comparisons using Fisher's Least Significant Difference test. 

k Clone 65 significantly higher than all other clones (P<0.0001) . 

c Clone 25 significantly greater than 2C8 (P<0.0005). 

^ Clone 11a significantly higher than 2C8 (P<0.0001). 



3 0 The data show that CYP2C19 stereospecif ically 

hydroxylates S-mephenytoin at the 4'- position at a rate which 
is at least 10 times higher than the rate in human liver 
microsomes. This is the first example of a human CYP protein 
which metabolizes S-mephenytoin with a turnover number 

3 5 appreciably higher than that of human liver microsomes. Other 

2C proteins showed a 100-fold reduced activity relative to 
2C19. One of the 2C9 variants tested (lie 359 ) is identical to 
that reported by Yasumori et al. , supra to show a low level of 
S-mephenytoin 4 • -hydroxylase activity. The low rate of 4*- 

4 0 hydroxylation of S-mephenytoin by 2C9 detected in the present 
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study with high specific activity 14 olabeled S-mephenytoin 
undoubtedly explains the conflicting reports from various 
laboratories concerning the ability of this cytochrome to 
metabolize mephenytoin (Yasumori et al. , supra; Srivastava et 
5 al., supra; Relling et al., supra). 

(d) Comparisons of Immunoblot Analysis of CYP2C 
Proteins in Human Livers with Liver Microsomal S-Mephenytoin 
4 ' -Hydroxylase Activities 

Microsomes from 16 human liver donor samples 

10 previously assayed for S- and R-mephenytoin 4 1 -hydroxylase 
activities were analyzed for CYP2C proteins by Western blot 
analysis (Fig. 10) using an antibody to 2C8 and a polyclonal 
antibody to 2C9 and 2C19 . Both 2C18 and 2C19 have mobilities 
similar to that of the low molecular weight band recognized in 

15 human microsomes by most antibodies to 2C9. However, an 

antibody to a 2C19 peptide was specific for 2C19. 2C18 could 
not be detected in human liver samples using a peptide 
antibody to 2C18 (~5 pmol detection limit) , indicating that 
this polypeptide is expressed poorly (<50 pmol/mg) . 

2 0 The 2C19 content of liver microsomes was consistent 

with their S-mephenytoin 4 • -hydroxylase activities (Fig. 10). 
In particular, samples 12 9 and 13 0 had extremely low S- 
mephenytoin 4 • -hydroxylase values, low S/R ratios, and 2C19 
appeared to be essentially absent in these microsomal samples. 
25 Densitometric analysis of immunoblots revealed that 2C19 
content of the 16 human liver microsomes correlated 
significantly with S-mephenytoin 4 1 -hydroxylase activity 
(r=0.7i8, P<0.005) (Fig, 11), but that thecontent of 2C9 did 
not correlate with this catalytic activity (r-0.49, P>0.05). 

3 0 There was also a significant correlation between 2C8 content 

and S-mephenytoin 4 1 -hydroxylase activity (r=0.82, P<0.0001). 
However, this correlation was probably fortuitous, because 2C8 
shows very low S-mephenytoin 4 '-hydroxylase activity either in 
recombinant form or when purified from human liver. 
3 5 Alternatively, the correlation may indicate an indirect 
regulatory role for 2C8 in controlling S-mephenytoin 4'- 
hydroxylase activity. 
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(e) Sequences of 2C9 and 2C18 mRNAs in Livers with 
High or Low S-mephenvtoin 4 1 -Hydroxylase Activities 

2C18 and 2C9 mRNAs from six of the above livers were 
amplified by PCR and directly sequenced through areas of known 
allelic variation to determine whether there was a 
relationship between S-mephenytoin 4 • -hydroxylase activity and 
the presence of a particular allelic variant (Table VI) . When 
the total 2C18 PCR products were sequenced, the two 
individuals with the highest S-mephenytoin hydroxylase 
activity were homozygous for Thr 385 (ACG). Of the two 
individuals with the lowest activity, one was homozygous for 
Met 385 , and one was heterozygous for Thr/Met 385 (AC/TG) . Two 
individuals with intermediate activity were also homozygous 
for Thr 385 . Similarly, when 2C9 mRNA from these same 
individuals was amplified and sequenced through known allelic 
variations, sample 108 tOlow s-mephenytoin 4 '-hydroxylase 
activity) was heterozygous at C/T 430 (coding for Cys/Arg 144 ) , 
while the other five individuals were homozygous for C 430 

(Arg 144 ) . Sequencing samples through bases 1072-1077, all 
samples except for 106 (high activity) read 1072 TACATT 1077 , 
coding for Tyr 358 Ile 359 . Sample 106 read TACA/CTT indicating 
that it was heterozygous for Ile/Leu 359 . These data indicate 
that there is no relationship between S-mephenytoin 4'- 
hydroxylase activity of human liver microsomes and the 
identity of the allelic variants of 2C18 (Thr/Met 385 ) or 2C9 

(Arg/Cys 144 , Tyr/Cys 358 , Ile/Leu 359 ) in these tissues. 
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TABLE VI 

Alleles in Human Livers with Varying S-Mephenytoin 
4 ' -Hydroxylase Phenotypes 

S-MPOHase 



5 


Pheno- 
type 


nmol/ 
min/mg 


Liver 
donor 


2C18 
allele 


2C9 allele 










High 


0.286 


106 


Thr 385 


Arg 144 


His 276 


Tyx 358 


Ile/Leu 3 




High 


0.351 


115 


Thr 385 


Arg 144 


His 276 


Tyr 358 


Ile359 


10 


Inter- 
mediate 


0 .070 


118 


Thr 385 


Arg 144 


His 276 


-jy r 358 


Leu 359 




Inter- 
mediate 


0.081 


123 


Thr 385 


Arg 144 


His 276 


Tyr^B 


Ile 359 




Low 


0.051 


108 


Thr/Met 385 


Arg/Cys 144 


His 276 


Tyx 358 


Ile 359 


15 


Low 


0.025 


129 


Met/Met 385 


Arg 144 


His 276 




Ile 359 



4 . Conclusion 

These results show that 2C19 has a turnover number 
for the 4 1 -hydroxylation of S-mephenytoin about 100-fold 

20 higher than that of 2C9 , 2C18, or 2C8. 2C19 hydroxylation was 
stereospecif ic for the S- enantiomer. The hepatic content of 
2C19 in 16 liver microsomal samples correlated with their S- 
mephenytoin 4 ' -hydroxylase activities. 2C9 appeared to be the 
primary tolbutamide hydroxylase, although 2C19 may also 

25 contribute to this catalytic activity. The identity of the 

allelic variant of 2C9 or 2C18 did not influence S-mephenytoin 
4 1 -hydroxylase activity. These data strongly indicate that 
2C19 is the key determinant of S-mephenytoin 4 1 -hydroxylase 
activity in human liver. 

3 0 Example 6: Diagnostic Assays for Detecting Individuals 
Deficient in S-Mephenvtoin 4 1 -Hydroxylase Activity 

Individuals deficient in S-mephenytoin 4*- 
hydroxylase activity are identified by comparing analysis of 
their genomic or cDNA encoding 2C19. 
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(a) Analysis of f ull-length cDNA 
Liver microsomes were prepared by standard 
differential centrif ugation methods (2) from human liver 
samples previously characterized as varying markedly in S- 
mephenytoin 4 1 -hydroxylase in vitro. Total liver RNA was 
isolated from the liver samples with trireagent (Molecular 
Research Center, Inc.) and reversed transcribed using random 
hexamers as 3' primers. Overlapping CYP2C19 cDNA fragments 
from five human liver samples that showed poor metabolism of 
S-mephenytoin in vitro were amplified by the polymerase chain 
reaction (PCR) . PCR was performed on an aliquot of the cDNA 
in 1 X PCR buffer (67 mM Tris-HCl pH 8.8, 17 mM (NH 4 ) 2 S0 4 , 10 
mM j3-mercaptoethanol, 7 /xM EDTA, 0.2 mg bovine serum 
albumin/ml), 50 /iM dATP , dCTP, dGTP and &TTP, 0.25 /*M of both 
PCR primers, 2.5 U AmpliTaq DNA polymerase (Perkin Elmer 
Cetus) and 1.0 mM MgCl 2 . The PCR conditions were: initial 
denaturation at 94°C for 3 min; 3 5 cycles consisting of: 
denaturation at 94°C for 3 0 sec, annealing at 53°C for 3 0 sec 
and extension at 72°C for 30 sec; final extension at 72°C for 
10 min; using a Perkin Elmer thermocycler . PCR products (20 
fil) were analyzed on 3% agarose gels stained with ethidium 
bromide . 

The PCR fragments were purified using Microcon 
filters (Amicon Inc.) and used in the cycle sequencing 
reaction employing fluorescence-tagged dye terminators (PRISM, 
Applied Biosystems) ed and sequenced. One partial CYP2C19 cDNA 
was isolated which exhibited aberrant splicing of exon 5 (Fig. 
12). This cDNA was missing the initial 4 0 bases of exon 5, 
and was also missing a Smal site (Fig. 12) . This deletion 
would be predicted to produce an early stop codon resulting in 
a truncated defective protein. 

(h) Rapid Assay for Identifying 40 bp Deletion in 

cDNA 

The analysis of full-length cDNAs identified a 40 bp 
deletion as a likely cause of S-mephenytoin 4 '-hydroxylase 
activity deficiency. A rapid assay was therefore devised to 
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analyze the specific region of a 2C19 cDNA molecule spanning 
the 4 0 bp deletion. 

Specific PCR primers were designed to amplify the 
region of the CYP2C19 cDNA spanning the deletion (Figs. 12 and 
5 13) . mRNA from 13 human livers previously characterized for 
extensive or poor metabolism of S-mephenytoin in vitro were 
reverse transcribed and amplified by PCR. Liver samples with 
the highest S-mephenytoin hydroxylase activity contained only 
the normally spliced mRNA. By contrast, sample 3 5 (a probable 
10 poor metabolizer) produced an amplification product containing 
the 4 0 bp deletion. Samples with intermediate S-mephenytoin 
4 '-hydroxylase activity and low amounts of CYP2C19 protein 
exhibited both the normal 2C19 cDNA and 2C19 cDNA containing 
the 4 0 bp deletion. 

15 (c) Genomic Sequencing of 2C19 

Because human tissue samples containing genomic 2C19 
DNA are much more easily obtained than samples containing 2C19 
mRNA, it is preferable to diagnose a polymorphic defect from 
genomic DNA. Genomic DNA was isolated from the blood of human 

20 volunteers previously characterized as poor or extensive 

metabolizer s of S-mephenytoin in vivo. The in vivo phenotype 
of most Swiss subjects was based on a hydroxylation index, 
with a value above 5.6 identifying a poor metabolizer (Kupfer 
et al., Eur. J. Clin. Pharmacol. 26:753-759 (1984)). The in 

25 vivo phenotype of American, Oriental and one Swiss subject was 
based on the urinary S/R ratio (Wedlund et al. # Clin. 
Pharmacol. Ther. 36:773-780 (1984)) — a poor metabolizer (PM) 
being defined as having a ratio > 0.95. Ah extensive 
metabolizer is defined as having a ratio < 0*8. An 

3 0 intermediate phenotype (IM) has been previously described with 
the extent of 4 1 -hydroxy lat ion being greater than in PMS but 
with the rate of metabolite formation being slower than EMS 
(Arns et al., Pharmacologist 32:140 (1990) ). 

It was believed that the 4 0 bp deletion identified 

35 in 2C19 cDNA occurred in exon 5, near the border with intron 4 
based on a comparison of the gene structure of CYP2C9 and 
CYP2C18 (de Morais et al. , supra). Thus, a segment of genomic 
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2C19 DNA across the intron 4/exon 5 border was amplified to 
identify the corresponding genetic defect in genomic DNA. In 
the initial assays, the untranslated regions of the genomic 
2C19 sequence were not known* However , intron 4 primers could 
be designed based on the corresponding sequences from CYP2C9 , 
which are expected to show about 95% sequence identity based 
on comparison with partial genomic sequences of 2C19. The 
primer for exon 5 was based on the cDNA sequence of CYP2C19 

(see Example 1) . The amplified DNA fragment was found to have 
the same size in both poor and extensive metabolizers. 
However, on restriction analysis, it was found that only the 
fragment from extensive metabolizers could be digested with 
Smal. The amplified DNA fragment was sequenced in extensive 
and poor metabolizers. 

Provision of genomic 2C19 DNA sequence in the intron 
4 region, allowed the design of a specific intron primer 
exhibiting perfect complementarity to the 2C19 DNA sequence in 
subsequent experiments. The forward PCR primer from intron 4 
was 5 ' - AATT ACAA C CAG AG CTTGG C — 3 ' and the reverse primer from 
exon 5 was 5 ' - T AT CACTTT CCAT AAAAG CAAG - 3 ' . The forward primer 
anneals 81 bp upstream of the intron 4/exon 5 junction. PCR 
conditions were as for amplification of cDNA except that 
reactions used 2 00 ng of genomic DNA and an initial 
denaturation at 96°C for 5 min. PCR products were restricted 
with Smal in the PCR buffer, without purification. Uncut 
products had the same size (168 bp) in all samples. Digested 
PCR products were analyzed on 4% agarose gels stained with 
ethidium bromide. 

DNA from 18 unrelated Caucasian extensive 
metabolizers and 10 unrelated Caucasian poor metabolizers was 
analyzed by this strategy. (Fig 14C) . All extensive 
metabolizers were either homozygous or heterozygous for the 
normal CYP2C19 gene, defined here as CYP2C19 wt (wild type). 
Among the 10 poor metabolizers, 7 were homozygous for the 
defective gene, defined as CYP2C19 m (poor mephenytoin 
hydroxy lat ion) . One poor metabolizer was heterozygous 

(CYP2C19 wt /CYP2C19 m ) , and two were homozygous 

(CYP2C19 wt /CYP2C19 wt ) , indicating that CYP2C19 m accounted for 
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15 of 20 alleles tested (75%) in Caucasian poor metabolizers . 
The presence of 5 CYP2C19 wt alleles in poor metabolizers 
suggests that additional mutations may exist in the Caucasian 
population, but that 2C19 m represents the predominant defect. 
5 Segments of DNA spanning the intron 4/exon 5 

boundary were also amplified from 17 unrelated Oriental 
subjects. Figure 14D shows that 10/17 Oriental poor 
metabolizers are homozygous for CYP2C19 jn , and CYP2C19 m 
accounts for 25 of 34 alleles (74%) in Oriental poor 

10 metabolizers. All 12 unrelated Oriental extensive 

metabolizers were either homozygous or heterozygous for the 
CYP2C19 wt gene. Thus, the major mutation responsible for the 
poor metabolizer phenotype in Oriental is identical to that 
found in Caucasians. 

15 The inheritance of CYP2C19 m in one Oriental family 

previously characterized with respect to the PM trait was also 
examined. Figure 14B shows that the poor metabolizer proband 
(arrow) and two other related poor metabolizers are homozygous 
for CYP2Cl9 m . Two individuals identified earlier as obligate 

20 heterozygotes (family C) (Ward et al. , Clin. Pharmacol. Ther. 
42:96-99 (1987)) were indeed found to be CYP2C19 m /CYP2C19 wt: . 
Thus, the inheritance of the genotype agrees with the 
Mendelian autosomal-recessive inheritance of phenotype. 

The DNA of three individuals {CYP2C19 wt /CYP2C19 wt , 

25 CYP2C19 m /CYP2C19 m$ and CYP2C19 wt /CYP2C19 m ) was amplified as 
described above and sequenced directly using an automated 
sequencer (Applied Biosystems) (Fig. 15) . Surprisingly, the 
sequence of intron 4 of the defective gene was identical to 
that of the normal gene. The only alteration found in 

3 0 CYP2C19 m was a G-*A change in exon 5 corresponding to position 
681 of the cDNA . This mutation introduces a cryptic splice 
site in this exon. This mutation also abolishes a Smal site at 
this position (CCCGGG CCCAGG) . The cryptic splice site 
shows slightly greater sequence identity to the consensus 

35 sequence for mammalian splice sites (Green, Arm. Rev. Cell 

Biol. 7:559-599 (1991)) than the normal splice site. A second 
potential branch point is also seen near the cryptic splice 
site. Surprisingly, the cDNA sequences from CYP2C8 and 
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CYP2C18 have a comparable potential cryptic splice site at the 
same point in exon 5 to that of CYP2C19 m , but the presence of 
the full-length 2C8 protein on immunoblots of human liver 
microsomes indicates that the majority of this protein is 
5 spliced correctly. 

Three of the samples tested by cDNA analysis in 
Figure 13 (sample 13, predicted genotype CYP2C19 wt /CYP2C19 wt ) , 
sample 21, predicted genotype CYP2C19 wt /CYP2C19 m , and sample 
35, predicted genotype CYP2C19 nt /CYP2C19 m ) were retested by 

10 genomic analysis. Perfect agreement was observed. The 

cryptic splice site appeared to be used exclusively in sample 
35 which is a predicted poor metabolizer and also in liver RNA 
of an additional CYP2C19 m /CYP2C19 w individual. The selection 
of the cryptic splice site results in the absence of CYP2C19 

15 in liver microsomes from poor metabolizers (Fig. 13) . 

(d) Conclusion 

The principal genetic defect (CYP2C19 m ) which is 
responsible for the poor metabolism of S-mephenytoin is a G-A 
mutation at position 681 of the coding sequence (within exon 

20 5). CYP2C19 m accounts for 75% of the defective alleles in 
both Caucasian and Oriental poor metabolizers. The single 
base change generates a cryptic internal splice site , which is 
used exclusively to produce an aberrantly spliced mRNA 
containing a 40 bp deletion. The CYP2C19 protein is virtually 

25 absent in livers of poor metabolizers* The mutation at 

position 681 is easily detected by PCR amplification of a 
segment of genomic 2C19 DNA spanning the mutation. 

Example 7: Identification and Diagnostic Assay for a Second 
Polymorphism (designated 63 6 ) in 2C19 

30 A second mutation designated the 636 polymorphism 

(also known as CYP2C19 m2 ) has identified. Genomic DNA from a 
Oriental poor metabolizer (subject 4 3 in Example 6) was 
amplified by PCR using a forward primer complementary to the 
antisense strand of intron 3 extending from bases -79 to -55 

35 and a reverse primer complementary to the sense strand 

extending from 79-89 bases into intron 4 (forward primer 5'- 
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TATTATCTGTTAACTAATATGA-3 ' (SEQ. ID. No. 57) and reverse primer 
5'- ACTTCAGGGCTTGGTCAATA-3 1 (SEQ. ID. No. 58). These primers 
were selected to amplify a 329 base pair product containing 
all of exon 4 and the surrounding intron/exon junctions. See 
5 Figure 17. Sequencing of the PCR products with an Applied 

Biosystems sequencer identified two mutations in exon 4 of the 
Oriental poor metabolizer. A second mutation at nucleotide 
63 6 entailed a G-»A transition at the nucleotide level and the 
conversion of a tryptophan codon at position 212 (TGG-»TGA) to 

10 a premature stop codon. This change would result in a 

truncated 211 amino acid polypeptide containing only the first 
4 exons, which would not contain the heme-binding region and 
would be inactive. The change at position 63 6 also destroys a 
BamHI site (GGATCC->GAATCC) (or its isoschizomer BstI) at 

15 positions 635-640. 

A PCR test was developed using the primers described 
above to amplify a 329 base pair product. The PCR product 
from the wild-type DNA from extensive metabolizers was cut 
with BamHI to yield two expected fragments with sizes of 233 

20 base pairs and 96 base pairs (Fig. 18). The PCR fragment 
amplified from the individual with the 636 mutation . (i . e. , 
Oriental subject #4 3) could not be restricted, indicating that 
he was homozygous for the 63 6 mutation. Genotyping of 7 
Oriental poor metabolizers whose phenotype could not be 

25 explained by the previous 681 mutation indicated that subjects 
41 and 43 were homozygous for the 636 mutation, while subjects 
36 , 48, 11, 69, and 100, were heterozygous for bearing both 
636 and 681 mutant alleles. The DNA in homozygous 636 mutant 
subjects 41 and 43 was not cut by BamHI. Tlie DNA in the 

3 0 heterozygotes yielded three bands at 327, 232, and 95 bp. The 
DNA from these heterozygotes also yielded three bands from 
Smal site (169, 120, and 4 9 bp) indicating they were also 
heterozygous for the 681 base pair mutation named CYP2C19 m ) . 
These data show that the 63 6 and 681 mutations completely 

35 account for the low phenotypes in all of the Oriental poor 

metabolizers of S-raephenytoin tested (17 individuals with 34 
alleles) . 
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Three Caucasian poor metabolizers who were not 
homozygous for the 681 mutation were also genotyped for the 
636 mutation. These were subjects JOB1, 502 and 503. One of 
these individuals (JOB1) was heterozygous for the 681 mutation 
while the other two did not contain the 681 mutation in either 
allele. None of these individuals exhibited a 63 6 mutation. 
Thus, there is probably at least one additional polymorphism 
in 2C19 in Caucasians. 

In summary, the 681 and 636 mutations explain 100% 
of Oriental poor metabolizers, and the 681 mutation alone 
accounts for about 75% of Caucasian poor metobilizers . 

While the foregoing invention has been described in 
some detail for purposes of clarity and understanding, it will 
be clear to one skilled in the art from a reading of this 
disclosure that various changes in form and detail can be made 
without departing from the true scope of the invention. All 
publications and patent documents cited in this application 
are incorporated by reference in their entirety for all 
purposes to the same extent as if each individual publication 
or patent document were so individually denoted. 
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(2) INFORMATION FOR SEQ ID N0:1: 

<i) SEQUENCE CHARACTERISTICS: 

(A> LENGTH: 49 0 amino acids 
(B) TYPE: amino acid 
(D) TOPOLOGY : unknown 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

Met Asp Pro Phe Val Val Leu Val Leu Cys Leu Ser Cys Leu Leu Leu 
15 10 15 

Leu Ser lie Trp Arg Gin Ser Ser Gly Arg Gly Lys Leu Pro Pro Gly 
20 25 30 

Pro Thr Pro Leu Pro Val lie Gly Asn lie Leu Gin lie Asp lie Lys 
35 40 45 

Asp Val Ser Lys Ser Leu Thr Asn Leu Ser Lys lie Tyr Gly Pro Val 
50 55 60 

Phe Thr Leu Tyr Phe Gly Leu Glu Arg Met Val Val Leu His Gly Tyr 
65 70 75 80 

Glu Val Val Lys Glu Ala Leu lie Asp Leu Gly Glu Glu Phe Ser Gly 
85 90 95 

Arg Gly His Phe Pro Leu Ala Glu Arg Ala Asn Arg Gly Phe Gly lie 
100 105 110 

Val Phe Ser Asn Gly Lys Arg Trp Lys Glu lie Arg Arg Phe Ser Leu 
115 120 125 

Met Thr Leu Arg Asn Phe Gly Met Gly Lys Arg Ser lie Glu Asp Arg 
130 135 140 

Val Gin Glu Glu Ala Arg Cys Leu Val Glu Glu Leu Arg Lys Thr Lys 
145 150 155 160 

Ala Ser Pro Cys Asp Pro Thr Phe lie Leu Gly Cys Ala Pro Cys Asn 
165 170 175 

Val He Cys Ser He He Phe Gin Lys Arg Phe Asp Tyr Lys Asp Gin 
180 185 190 

Gin Phe Leu Asn Leu Met Glu Lys Leu Asn Glu Asn He Arg He Val 
195 200 205 

Ser Thr Pro Trp He Gin He Cys Asn Asn Phe Pro Thr He lie Asp 
210 215 220 

Tyr Phe Pro Gly Thr His Asn Lys Leu Leu Lys Asn Leu Ala Phe Met 
225 230 235 240 

Glu Ser Asp He Leu Glu Lys Val Lys Glu His Gin Glu Ser Met Asp 
245 250 255 

He Asn Asn Pro Arg Asp Phe lie Asp Cys Phe Leu He Lys Met Glu 
260 265 270 

Lys Glu Lys Gin Asn Gin Gin Ser Glu Phe Thr He Glu" Asn Leu Val 
275 280 285 

He Thr Ala Ala Asp Leu Leu Gly Ala Gly Thr Glu Thr Thr Ser Thr 
290 295 300 
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Thr Leu Arg Tyr Ala Leu Leu Leu Leu Leu Lys His Pro Glu Val Thr 
305 310 315 320 

Ala Lys Val Gin Glu Glu lie Glu Arg Val He Gly Arg Asn Arg Ser 
325 330 335 

Pro Cys Met Gin Asp Arg Gly His Met Pro Tyr Thr Asp Ala Val Val 
340 345 350 

His Glu Val Gin Arg Tyr He Asp Leu He Pro Thr Ser Leu Pro His 
355 360 365 

Ala Val Thr Cys Asp Val Lys Phe Arg Asn Tyr Leu He Pro Lys Gly 
370 375 380 

Thr Thr He Leu Thr Ser Leu Thr Ser Val Leu His Asp Asn Lys Glu 
385 390 395 400 

Phe Pro Asn Pro Glu Met Phe Asp Pro Arg His Phe Leu Asp Glu Gly 
405 410 415 

Gly Asn Phe Lys Lys Ser Asn Tyr Phe Met Pro Phe Ser Ala Gly Lys 
420 425 430 

Arg He Cys Val Gly Glu Gly Leu Ala Arg Met Glu Leu Phe Leu Phe 
435 440 445 

Leu Thr Phe lie Leu Gin Asn Phe Asn Leu Lys Ser Leu He Asp Pro 
450 455 460 

Lys Asp Leu Asp Thr Thr Pro Val Val Asn Gly Phe Ala Ser Val Pro 
465 470 475 480 

Pro Phe Tyr Gin Leu Cys Phe He Pro Val 
485 490 
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(2) INFORMATION FOR SEQ ID NO:2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 174 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



CTTCAATGGA 


TCCTTTTGTG 


GTCCTTGTGC 


TCTGTCTCTC 


ATGTTTGCTT 


CTCCTTTCAA 


60 


TCTGGAGACA 


GAGCTCTGGG 


AGAGGAAAAC 


TCCCTCCTGG 


CCCCACTCCT 


CTCCCAGTGA 


120 


TTGGAAATAT 


CCTACAGATA 


GATATTAAGG 


ATGTCAGCAA 


ATCCTTAACC 


AATCTCTCAA 


180 


AAATCTATGG 


CCCTGTGTTC 


ACTCTGTATT 


TTGGCCTGGA 


ACGCATGGTG 


GTGCTGCATG 


240 


GATATG AAG T 






nlUl X UUrtUri 


GGAGTTTTCT 


GGAAGAGGCC 


on 


ATTTCCCACT 


GGCTGAAAGA 


G CTAA CA GAG 


GATTTGGAAT 


CGTTTTCAGC AATGGAAAGA 


360 


GATGGAAGGA 


GATCCGGCGT 


TTCTCCCTCA 


TGACGCTGCG 


GAATTTTGGG 


ATGGGGAAGA 


420 


GGAGCATTGA 


GGACCGTGTT 


CAAGAGGAAG 


CCCGCTGCCT 


TGTGGAGGAG 


TTGAGAAAAA 


480 


CCAAGGCTTC 


ACCCTGTGAT 


CCCACTTTCA 


TCCTGGGCTG 


TGCTCCCTGC 


AATGTGATCT 


540 


G CTCCATTAT 


TTTCCAGAAA 


CGTTTCGATT 


ATAAAGATCA 


GCAATTTCTT AACTTGATGG 


600 


AAAAATTGAA 


TGAAAACATC 


AGGATTGTAA 


GCACCCCCTG 


GATCCAGATA 


TGCAATAATT 


660 


TTCCCACTAT 


CATTGATTAT 


TTCCCGGGAA 


CCCATAACAA 


ATTACTTAAA 


AACCTTGCTT 


720 


TTATGGAAAG 


TGATATTTTG 


GAGAAAGTAA 


AAGAACACCA 


AGAATCGATG 


GACATCAACA 


780 


ACCCTCGGGA 


CTTTATTGAT 


TGCTTCCTGA 


TCAAAATGGA 


GAAGGAAAAG 


CAAAACCAAC 


840 


AGTCTGAATT 


CACTATTGAA 


AACTTGGTAA 


TCACTGCAGC 


TGACTTACTT 


GGAGCTGGGA 


900 


CAGAGACAAC 


AAGCACAACC 


CTGAGATATG 


CTCTCCTTCT 


CCTGCTGAAG 


CACCCAGAGG 


960 


TCACAG CTAA 


AGTCCAGGAA 


GAGATTGAAC 


GTGTCATTGG 


CAGAAACCGG 


AGCCCCTGCA 


1020 


TGCAGGACAG 


GGGCCACATG 


CCCTACACAG 


ATGCTGTGGT 


GCACGAGGTC 


CAGAGATACA 


1080 


TCGACCTCAT 


CCCCACCAGC 


CTGCCCCATG 


CAGTGACCTG 


TGACGTTAAA 


TTCAGAAACT 


1140 
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ACCTCATTCC 


CAAGGGCACA 


ACCATATTAA 


CTTCCCTCAC 


TTCTGTG CTA 


CATGACAACA 


1200 


AAGAATTTCC 


CAACCCAGAG 


ATGTTTGACC 


CTCGTCACTT 


TCTGGATGAA 


GGTGGAAATT 


1260 


TTAAGAAAAG 


TAACTACTTC 


ATGCCTTTCT 


CAGCAGGAAA 


ACGGATTTGT 


GTGGGAGAGG 


1320 


GCCTGGCCCG 


CATGGAGCTG 


TTriTATTCC 


TGACCTTCAT 


TTTACAGAAC 


TTTAACCTGA 


1380 ; 


AATCTCTGAT 


TGACCCAAAG 


GACCTTGACA 


CAACTC CTGT 


TGTCAATGGA 


TTTGCTTCTG 


1440 


TCCCGCCCTT 


CTATCAGCTG 


TGCTTCATTC 


CTGTCTGAAG 


AAG CACAGAT 


GGTCTGGCTG 


1500 


CTCCTGTGCT 


GTCCCTGCAG 


CTCTCTTTCC 


TCTGGTCCAA 


ATTTCACTAT 


CTGTGATGCT 


1560 


T CTT CTGAC C 


CGTCATCTCA 


CATTTTCCCT 


TCCCCCAAGA 


TCTAGTGAAC 


ATTCAGCCTC 


1620 


CATTAAAAAA 


GTTTCACTGT 


GCAAATATAT 


CTGCTATTCC 


CCATACTCTA 


TAATAGTTAC 


1680 


ATTGAGTGCC 


ACATAATGCT 


GATACTTGTC 


TAATGTTGAG 


TTATTAACAT 


ATTATTATTA 


174 0 


AATAGA 












1746 



(2) INFORMATION FOR SEQ ID NO:3: 

<i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 490 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNESS : s ing 1 e 

( D) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 

Met Asp Ser Leu Val Val Leu Val Leu Cys Leu Ser Cys Leu Leu Leu 
15 10 15 

Leu Ser Leu Trp Arg Gin Ser Ser Gly Arg Gly Lys Leu Pro Pro Gly 
20 25 30 

Pro Thr Pro Leu Pro Val lie Gly Asn lie Leu Gin lie Gly. lie Lys 
35 40 45 

Asp lie Ser Lys Ser Leu Thr Asn Leu Ser Lys Val Tyr Gly Pro Val 
50 55 60 

Phe Thr Leu Tyr Phe Gly Leu Lys Pro lie Val Val Leu His Gly Tyr 
65 70 75 80 

Glu Ala Val Lys Glu Ala Leu lie Asp Leu Gly Glu Glu Phe Ser Gly 
85 90 95 

Arg Gly lie Phe Pro Leu Ala Glu Arg Ala Asn Arg Gly Phe Gly lie 
100 105 110 

Val Phe Ser Asn Gly Lys Lys Trp Lys Glu lie Arg Arg Phe Ser Leu 
115 120 125 

Met Thr Leu Arg Asn Phe Gly Met Gly Lys Arg Ser Tie Glu Asp Arg 
130 135 140 

Val Gin Glu Glu Ala Arg Cys Leu Val Glu Glu Leu Arg Lys Thr Lys 
145 150 155 160 

Ala Ser Pro Cys Asp Pro Thr Phe lie Leu Gly Cys Ala Pro Cys Asn 
165 170 175 
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Val He Cys Ser He He Phe His Lys Arg Phe Asp Tyr Lys Asp Gin 
180 185 190 

Gin Phe Leu Asn Leu Met Glu Lys Leu Asn Glu Asn He Lys He Leu 
195 200 205 

Ser Ser Pro Trp lie Gin lie Cys Asn Asn Phe Ser Pro He He Asp 
210 215 220 

Tyr Phe Pro Gly Thr His Asn Lys Leu Leu Lys Asn Val Ala Phe Met 
225 230 235 240 

Lys Ser Tyr lie Leu Glu Lys Val Lys Glu His Gin Glu Ser Met Asp 
245 250 255 

Met Asn Asn Pro Gin Asp Phe lie Asp Cys ' Phe Leu Met Lys Met Glu 
260 265 270 

Lys Glu Lys His Asn Gin Pro Ser Glu Phe Thr lie Glu Ser Leu Glu 
275 280 285 

Asn Thr Ala Val Asp Leu Phe Gly Ala Gly Thr Glu Thr Thr Ser Thr 
290 295 300 

Thr Leu Arg Tyr Ala Leu Leu Leu Leu Leu Lys His Pro Glu Val Thr 
305 310 315 320 

Ala Lys Val Gin Glu Glu He Glu Arg Val He Gly Arg Asn Arg Ser 
325 330 335 

Pro Cys Met Gin Asp Arg Ser His Met Pro Tyr Thr Asp Ala Val Val 
340 345 350 

His Glu Val Gin Arg Tyr Leu Asp Leu Leu Pro Thr Ser Leu Pro His 
355 360 365 

Ala Val Thr Cys Asp lie Lys Phe Arg Asn Tyr Leu lie Pro Lys Gly 
370 375 380 

Thr Thr lie Leu lie Ser Leu Thr Ser Val Leu His Asp Asn Lys Glu 
385 390 395 400 

Phe Pro Asn Pro Glu Met Phe Asp Pro His His Phe Leu Asp Glu Gly 
405 410 415 

Gly Asn Phe Lys Lys Ser Lys Tyr Phe Met Pro Phe Ser Ala Gly Lys 
420 425 430 

Arg He Cys Val Gly Glu Ala Leu Ala Gly Met Glu Leu Phe Leu Phe 
435 440 .445 

Leu Thr Ser He Leu Gin Asn Phe Asn Leu Lys Ser Leu Val Asp Pro 
450 455 460 

Lys Asn Leu Asp Thr Thr Pro Val Val Asn Gly Phe Ala Ser Val Pro 
465 470 475 480 

Pro Phe Tyr Gin Leu Cys Phe lie Pro Val 
485 490 
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(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1854 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : cDNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 



GAGAAGGCTT 


CAATGGATTC 


TCTTGTGGTC 


CTTGTGCTCT 


GTCTCTCATG 


TTTGCTTCTC 


60 


CTTTCACTCT 


GGAGACAGAG 


CTCTGGGAGA 


GGAAAACTCC 


CTCCTGGCCC 


CACTCCTCTC 


120 


CCAGTGATTG 


GAAATATC CT 


ACAGATAGGT 


ATTAAGGACA 


TCAGCAAATC 


CTTAACCAAT 


180 


CTCTCAAAGG 


TCTATGGCCC 


TGTGTTCACT 


CTGTATTTTG 


GCCTGAAACC 


CATAGTGGTG 


240 


CTGCATGGAT 


ATGAAGCAGT 


GAAGGAAGCC 


CTGATTGATC 


TTGGAGAGGA 


GTTTTCTGGA 


300 


AGAGGCATTT 


TCCCACTGGC 


TGAAAGAGCT 


AACAGAGGAT 


TTGGAATTGT 


TTTCAGCAAT 


360 


GGAAAGAAAT 


GGAAGGAGAT 


CCGGCGTTTC 


TCCCTCATGA 


CGCTGCGGAA 


TTTTGGGATG 


420 


GGGAAGAGGA 


GCATTGAGGA 


CCGTGTTCAA 


GAGGAAGCCC 


G CTG C CTTGT 


GGAGGAGTTG 


480 


AGAAAAACCA 


AGGCCTCACC 


CTGTGATCCC 


ACTTTCATCC 


TGGGCTGTGC 


TCCCTGCAAT 


540 


GTGATCTG CT 


CCATTATTTT 


CCATAAACGT 


TTTGATTATA 


AAGATCAGCA 


ATTTCTTAAC 


600 


TTAATGGAAA 


AGTTGAATGA 


AAACATCAAG 


ATTTTGAGCA 


GCCCCTGGAT 


CCAGATCTGC 


660 


AATAATTTTT 


CTCCTATCAT 


TGATTACTTC 


CCGGGAACTC 


ACAACAAATT 


ACTTAAAAAC 


720 


GTTGtJTTTTA 


TGAAAAGTTA 


TATTTTGGAA 


AAAGTAAAAG 


AACACCAAGA 


ATCAATGGAC 


780 




CTCAGGACTT 


TATTGATTG C 




AAATGGAGAA 


GGAAAAGCAC 


o \J 


AACCAACCAT 


CTGAATTTAC 


TATTGAAAGC 


TTGGAAAACA 


CTGCAGTTGA 


CTTGTTTGGA 


900 


GCTGGGACAG 


AGACGACAAG 


CACAACCCTG 


AGATATGCTC 


TCCTTCTCCT 


GCTGAAGCAC 


960 


CCAGAGGTCA 


CAG CTAAAGT 


CCAGGAAGAG 


ATTGAACGTG 


TGATTGGCAG 


AAACCGGAGC 


1020 


CCCTGCATGC 


AAGACAGGAG 


CCACATGCCC 


TACACAGATG 


CTGTGGTGCA 


CGAGGTCCAG 


1080 


AGATACCTTG 


ACCTTCTCCC 


CACCAGCCTG 


CCCCATGCAG 


TGACCTGTGA 


CATTAAATTC 


1140 


AGAAACTATC 


TCATTC CCAA 


GGGCACAACC 


ATATTAATTT 


CCCTGACTTC 


TGTGCTACAT 


1200 


GACAACAAAG 


AATTTCCCAA 


CCCAGAGATG 


TTTGACCCTC 


ATCACTTTCT 


GGATGAAGGT 


1260 


GGCAATTTTA 


AGAAAAGTAA 


ATACTTCATG 


CCTTTCTCAG 


CAGGAAAACG 


GATTTGTGTG 


1320 


GGAGAAGCCC 


TGGCCGGCAT 


GGAGCTGTTT 


TTATTCCTGA 


CCTCCATTTT ACAGAACTTT 


1380 


AACCTGAAAT 


CTCTGGTTGA 


CCCAAAGAAC 


CTTGACACCA 


CTCCAGTTGT 


CAATGGTTTT 


1440 


GCCTCTGTGC 


CGCCCTTCTA 


CCAGCTGTGC 


TTCATTCCTG 


TCTGAAGAAG 


AGCAGATGGC 


1500 


CTGGCTGCTG 


CTGTG CAGTC 


CCTGCAGCTC 


TCTTTCCTCT 


GGGGCATTAT 


CCATCTTTCA 


1560 


CTATCTGTAA 


TGCCTTTTCT 


CACCTGTCAT 


CTCACATTTT 


CCCTTCCCTG 


AAGATCTAGT 


1620 


GAACATTCGA 


CCTTCATTAC 


GGAGAGTTTC 


CTATGTTTCA 


. CTGTG CAAAT 


' ATATCTGCTA 


1680 
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TTCTCCATAC TCTGTAACAG TTGCATTGAC TGTCACATAA TGCTCATACT TATCTAATGT 174 0 

TGAGTTATTA ATATGTTATT ATTAAATAGA GAAATATGAT TTG TGTATTA TAATTCAAAG 1800 

GCATTTCTTT TCTGCATGTT CTAAATAAAA AG CATTATTA TTTGCTGAAA AAAA 1854 

(2) INFORMATION FOR SEQ ID NO : 5 ; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
<xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

Met Asp Pro Ala Val Ala Leu Val Leu Cys Leu Ser Cys Leu Phe Leu 
1 5 10 15 

Leu Ser Leu Trp Arg Gin Ser Ser Gly Arg Gly Arg Leu Pro Ser Gly 
20 25 30 

Pro Thr Pro Leu Pro He He Gly Asn He Leu Gin Leu Asp Val Lys 
35 40 45 

Asp Met Ser Lys Ser Leu Thr Asn Phe Ser Lys Val Tyr Gly Pro Val 
50 55 60 

Phe Thr Val Tyr Phe Gly Leu Lys Pro He Val Val Leu His Gly Tyr 
65 70 75 80 

Glu Ala Val Lys Glu Ala Leu He Asp His Gly Glu Glu Phe Ser Gly 
85 90 95 

Arg Gly Ser Phe Pro Val Ala Glu Lys Val Asn Lys Gly Leu Gly He 
100 105 110 

Leu Phe Ser Asn Gly Lys Arg Trp Lys Glu He Arg Arg Phe Cys Leu 
115 120 125 

Met Thr Leu Arg Asn Phe Gly Met Gly Lys Arg Ser He Glu Asp Arg 
130 135 140 

Val Gin Glu Glu Ala Arg Cys Leu Val Glu Glu Leu Arg Lys Thr Asn 
145 150 155 160 

Ala Ser Pro Cys Asp Pro Thr Phe He Leu Gly Cys TVla Pro Cys Asn 
165 170 175 

Val He Cys Ser Val He Phe His Asp Arg Phe Asp Tyr Lys Asp Gin 
180 185 190 

Arg Phe Leu Asn Leu Met Glu Lys Phe Asn Glu Asn Leu Arg He Leu 
195 200 205 
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Ser Ser Pro Trp lie Gin Val Cys Asn Asn Phe Pro Ala Leu lie Asp 
210 215 220 

Tyr Leu Pro Gly Ser His Asn Lys lie Ala Glu Asn Phe Ala Tyr lie 
225 230 235 240 

Lys Ser Tyr Val Leu Glu Arg lie Lys Glu His Gin Glu Ser Leu Asp 
245 250 255 

Met Asn Ser Ala Arg Asp Phe lie Asp Cys Phe Leu lie Lys Met Glu 
260 265 270 

Gin Glu Lys His Asn Gin Gin Ser Glu Phe Thr Val Glu Ser Leu lie 
275 280 285 

Ala Thr Val Thr Asp Met Phe Gly Ala Gly Thr Glu Thr Thr Ser Thr 
290 295 300 

Thr Leu Arg Tyr Gly Leu Leu Leu Leu Leu Lys Tyr Pro Glu Val Thr 
305 310 315 320 

Ala Lys Val Gin Glu Glu He Glu Cys Val Val Gly Arg Asn Arg Ser 
325 330 335 

Pro Cys Met Gin Asp Arg Ser His Met Pro Tyr Thr Asp Ala Val Val 
340 345 350 

His Glu He Gin Arg Tyr He Asp Leu Leu Pro Thr Asn Leu Pro His 
355 360 365 

Ala Val Thr Cys Asp Val Lys Phe Lys Asn Tyr Leu He Pro Lys Gly 
370 375 380 

Thr Thr He He Thr Ser Leu Thr Ser Val Leu His Asn Asp Lys Glu 
385 390 395 400 

Phe Pro Asn Pro Glu Met Phe Asp Pro Gly His Phe Leu Asp Lys Ser 
405 410 415 

Gly Asn Phe Lys Lys Ser Asp Tyr Phe Met Pro Phe Ser Ala Gly Lys 
420 425 430 

Arg Met Cys Met Gly Glu Gly Leu Ala Arg Met Glu Leu Phe Leu Phe 
435 440 445 

Leu Thr Thr He Leu Gin Asn Phe Asn Leu Lys Ser Gin Val Asp Pro 
450 455 460 

Lys Asp He Asp He Thr Pro He Ala Asn Ala Phe Gly Arg Val Pro 
465 470 475 480 
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Pro Leu Tyr Gin Leu Cys Phe He Pro Val 
485 490 

(2) INFORMATION FOR SEQ ID NO:6; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2009 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: cDNA 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 



GGCACCGGAA 


AGAACAAGAA 


AAAAGAACAC 


CTTATTTTTA 


TCTTCTTCAG 


TGAGCCAATG 


60 


TTCATTCAAA 


AGAGAGATTA 


AAGTGCTTfT 


TGCTGACTAG 


TCACAGTCAG 


AGTCAGAATC 


12 0 


ACAGGTGGAT 


TAGTAGGGAG 


TGTTATAAAA 


GCCTTGAAGT 


GAAAGCCCGC 


AGTTGTCTTA 


180 


CTAAGAAGAG 


AAGCCTTCAA 


TGGATCCAGC 


TGTGGCTCTG 


GTGCTCTGTC 


TCTCCTGTTT 


240 


GTTTCTCCTT 


TCACTCTGGA 


GGCAGAGCTC 


TGGAAGAGGG 


AGGCTCCCGT 


CTGGCCCCAC 


300 


TCCTCTCCCG 


ATTATTG GAA 


ATATCCTGCA 


GTTAGATGTT 


AAGGACATGA 


GCAAATCCTT 


360 


AACCAATTTC 


TCAAAAGTCT 


ATGGCCCTGT 


GTTCACTGTG 


TATTTTGGCC 


TGAAGCCCAT 


420 


TGTGGTGTTG 


CATGGATATG 


AAGCAGTGAA 


GGAGGCCCTG 


ATTGATCATG 


GAGAGGAGTT 


480 


TTCTGGAAGA 


GGAAGTTTTC 


CAGTGGCTGA 


AAAAGTTAAC 


AAAGGACTTG 


GAATCCTTTT 


540 


CAGCAATGGA 


AAGAGATGGA 


AGGAGATCCG 


GCGTTTCTGC 


CTCATGACTC 


TGCGGAATTT 


600 


TGGGATGGGG 


AAGAGGAGCA 


TCGAGGACCG 


TGTTCAAGAG 


GAAGCCCGCT 


GCCTTGTGGA 


660 


GGAGTTGAGA 


AAAACCAATG 


CCTCACCCTG 


TGATCCCACT 


TTCATCCTGG 


GCTGTGCTCC 


720 


CTGCAATGTG 


ATCTGCTCTG 


TTATTTTCCA 


TGATCGATTT 


GATTATAAAG 


ATCAGAGGTT 


780 


TCTTAACTTG 


ATGGAAAAAT 


TCAATGAAAA 


CCTCAGGATT 


CTGAGCTCTC 


CATGGATCCA 


840 


GGTCTGCAAT 


AATTTCCCTG 


CTCTCATCGA 


TTATCTCCCA 


GGAAGTCATA 


ATAAAATAGC 


900 


TGAAAATTTT 


GCTTACATTA 


AAAGTTATGT 


ATTGGAGAGA 


ATAAAAGAAC 


ATCAAGAATC 


960 


CCTGGACATG 


AACAGTG CTC 


GGGACTTTAT 


TGATTGTTTC 


CTGATCAAAA 


TGGAACAGGA 


1020 


AAAGCACAAT 


CAACAGTCTG 


AATTTACTGT 


TGAAAGCTTG 


ATAGCCACTG 


TAACTGATAT 


1080 


GTTTGGGGCT 


GGAACAGAGA 


CAACGAGCAC 


CACTCTGAGA 


TATGGACTCC 


TGCTCCTGCT 


1140 


GAAGTACCCA 


GAGGTCACAG 


CTAAAGTCCA 


GGAAGAGATT 


GAATGTGTAG 


TTGGCAGAAA 


1200 


CCGGAGCCCC 


TGTATGCAGG 


ACAGGAGTCA 


CATGCCCTAC 


ACAGATGCTG 


TGGTGCACGA 


1260 


GATCCAGAGA 


TACATTGACC 


TCCTCCCCAC 


CAACCTGCCC 


CATGCAGTGA 


CCTGTGATGT 


1320 


TAAATTCAAA 


AACTACCTCA 


TCCCCAAGGG 


CACGACCATA 


ATAACATCCC 


TGACTTCTGT 


1380 


G CTGCACAAT 


GACAAAGAAT 


TCCCCAACCC 


AGAGATGTTT 


GACCCTGGCC 


ACTTTCTGGA 


1440 


TAAGAGTGGC 


AACTTTAAGA 


AAAGTGACTA 


CTTCATGCCT 


TTCTCAGCAG 


GAAAACGGAT 


1500 


GTGTATGGGA 


GAGGGCCTGG 


CCCGCATGGA 


GCTG'rri'riA 


TTCCTGACCA 


CCATTTTGCA 


1560 
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GAACTTTAAC 


CTGAAATCTC 


AGGTTGACCC 


AAAGGATATT 


GACATCACCC 


CCATTGCCAA 


1620 


TGCATTTGGT 


CGTGTGCCAC 


CCTTGTACCA 


GCTCTGCTTC ATTCCTGTCT 


GAAGAAGGGC 


1680 


AGATAGTTTG 


GCTGCTCCTG 


TGCTGTCACC 


TGCAATTCTC 


CCTTATCAGG 


GCCATTAGCC 


1740 


TCTCCCTTCT 


CTCTGTGAGG 


GATATTTTCT 


CTGACTTGTC 


AATCCACATC 


TTCCCATTCC 


1800 


CTCAAGATCC 


AATGAA CAT C 


CAACCTCCAT 


TAAAGAGAGT 


TTCTTGGGTC 


ACTTCCTAAA 


1860 


TATATCTGCT 


ATTCTCCATA 


CTCTGTATCA 


CTTGTATTGA 


CCACCACATA 


TGCTAATACC 


1920 


TATCTACTGC 


TGAGTTGTCA 


GTATGTTATC 


ACTAGAAAAC 


AAAGAAAAAT 


GATTAATAAA 


1980 


TGACAATTCA 


GAGCCAAAAA 


AAAAAAAAA 








2009 



(2) INFORMATION FOR SEQ ID NO:7: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 49 0 amino acids 

(B) TYPE: amino acid 

<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: 

Met Glu Pro Phe Val Val Leu Val Leu Cys Leu Ser Phe Met Leu Leu 
15 10 15 

Phe Ser Leu Trp Arg Gin Ser Cys Arg Arg Arg Lys Leu Pro Pro Gly 
20 25 30 

Pro Thr Pro Leu Pro lie lie Gly Asn Met Leu Gin lie Asp Val Lys 
35 40 45 

Asp He Cys Lys Ser Phe Thr Asn Phe Ser Lys Val Tyr Gly Pro Val 
50 55 60 

Phe Thr Val Tyr Phe Gly Met Asn Pro He Val Val Phe His Gly Tyr 
65 70 75 80 

Glu Ala Val Lys Glu Ala Leu He Asp Asn Gly Glu Glu Phe Ser Gly 
85 90 , s 95 

Arg Gly Asn Ser Pro He Ser Gin Arg lie Thr Lys Gly Leu Gly He 
100 105 110 

He Ser Ser Asn Gly Lys Arg Trp Lys Glu He Arg Arg Phe Ser Leu 
115 120 125 

Thr Asn Leu Arg Asn Phe Gly Met Gly Lys Arg Ser lie Glu Asp Arg 
130 135 140 

Val Gin Glu Glu Ala His Cys Leu Val Glu Glu Leu Arg Lys Thr Lys 
145 150 155 160 

Ala Ser Pro Cys Asp Pro Thr Phe He Leu Gly Cys Ala Pro Cys Asn 
165 170 175 

Val He Cys Ser Val Val Phe Gin Lys Arg Phe Asp Tyr Lys Asp Gin 
180 185 190 
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Asn Phe Leu Thr Leu Met Lys Arg Phe Asn Glu Asn Phe Arg lie Leu 
195 200 205 

Asn Ser Pro Trp He Gin Val Cys Asn Asn Phe Pro Leu Leu He Asp 
210 215 220 

Cys Phe Pro Gly Thr His Asn Lys Val Leu Lys Asn Val Ala Leu Thr 
225 230 235 240 

Arg Ser Tyr He Arg Glu Lys Val Lys Glu His Gin Ala Ser Leu Asp 
245 250 255 

Val Asn Asn Pro Arg Asp Phe Met Asp Cys Phe Leu He Lys Met Glu 
260 265 270 

Gin Glu Lys Asp Asn Gin Lys Ser Glu Phe Asn He Glu Asn Leu Val 
275 280 285 

Gly Thr Val Ala Asp Leu Phe Val Ala Gly Thr Glu Thr Thr Ser Thr 
290 295 300 

Thr Leu Arg Tyr Gly Leu Leu Leu Leu Leu Lys His Pro Glu Val Thr 
305 310 315 320 

Ala Lys Val Gin Glu Glu He Asp His Val He Gly Arg His Arg Ser 
325 330 335 

Pro Cys Met Gin Asp Arg Ser His Met Pro Tyr Thr Asp Ala Val Val 
340 345 350 

His Glu He Gin Arg Tyr Ser Asp Leu Val Pro Thr Gly Val Pro His 
355 360 365 

Ala Val Thr Thr Asp Thr Lys Phe Arg Asn Tyr Leu He Pro Lys Gly 
370 375 380 

Thr Thr lie Met Ala Leu Leu Thr Ser Val Leu His Asp Asp Lys Glu 
385 390 395 400 

Phe Pro Asn Pro Asn He Phe Asp Pro Gly His Phe Leu Asp Lys Asn 
405 410 415 

Gly Asn Phe Lys Lys Ser Asp Tyr Phe Met Pro Phe Ser Ala Gly Lys 
420 425 430 

Arg He Cys Ala Gly Glu Gly Leu Ala Arg Met Glu Leu Phe Leu Phe 
435 440 445 

Leu Thr Thr He Leu Gin Asn Phe Asn Leu Lys Ser Val Asp Asp Leu 
450 455 460 
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Lys Asn Leu Asn Thr Thr Ala Val Thr Lys Gly He Val Ser Leu Pro 
465 470 475 480 

Pro Ser Tyr Gin He Cys Phe He Pro Val 
485 490 

(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1829 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 



AATGGAACCT 


TTTGTGGTCC 


TGGTG CTGTG 


TCTCTCTTTT 


ATGCTTCTCT 


TTTCACTCTG 


60 


GAGACAGAGC 


TGTAGGAGAA 


GGAAGCTCCC 


TCCTGGCCCC 


ACTCCTCTTC 


CTATTATTGG 


120 


AAATATG CTA 


CAGATAGATG 


TTAAGGACAT 


CTGCAAATCT 


TTCACCAATT 


TCTCAAAAGT 


180 


CTATGGTCCT 


GTGTTCACCG 


TGTATTTTGG 


CATGAATCCC 


ATAGTG GTGT 


TTCATGGATA 


24 0 


TGAGGCAGTG 


AAGGAAGCCC 


TGATTGATAA 


TGGAGAGGAG 


TTTTCTGGAA 


GAGGCAATTC 


300 


CCCAATATCT 


CAAAGAATTA 


CTAAAGGACT 


TGGAATCATT 


TCCAGCAATG 


GAAAGAGATG 


360 


GAAGGAGATC 


CGGCGTTTCT 


CCCTCACAAA 


CTTGCGGAAT 


TTTGGGATGG 


GGAAGAGGAG 


420 


CATTGAGGAC 


CGTGTTCAAG 


AGGAAGCTCA 


CTGCCTTGTG 


GAGGAGTTGA 


GAAAAACCAA 


480 


GGCTTCACCC 


TGTGATCCCA 


CTTTCATCCT 


GGGCTGTGCT 


CCCTGCAATG 


TGATCTG CTC 


54 0 


CGTTGTTTTC 


CAGAAACGAT 


TTGATTATAA 


AGATCAGAAT 


TTTCTCACCC 


TGATGAAAAG 


600 


ATTCAATGAA 


AACTTCAGGA 


TTCTGAACTC 


CCCATGGATC 


CAGGTCTGCA 


ATAATTTCCC 


660 


TCTACTCATT 


GATTGTTTCC 


CAGGAACTCA 


CAACAAAGTG 


CTTAAAAATG 


TTGCTCTTAC 


720 


ACGAAGTTAC 


ATTAGGGAGA 


AAGTAAAAGA 


ACACCAAGCA 


TCACTGGATG 


TTAACAATCC 


780 


TCGGGACTTT ATGGATTGCT 


TCCTGATCAA 


AATGGAGCAG 


GAAAAGGACA 


ACCAAAAGTC 


840 


AGAATTCAAT 


ATTGAAAACT 


TGGTTGGCAC 


TGTAGCTGAT 


CTATTTGTTG 


CTGGAACAGA 


900 


GACAACAAGC 


ACCACTCTGA 


GATATGGACT 


CCTGCTCCTG 


CTGAAGCACC 


CAGAGGTCAC 


960 


AGCTAAAGTC 


CAGGAAGAGA 


TTGATCATGT 


AATTGG CAGA 


CACAGGAGCC 


CCTGCATGCA 


1020 


GGATAGGAGC 


CACATGCCTT 


ACACTGATGC 


TGTAGTGCAC 


GAGATCCAGA 


GATACAGTGA 


1080 


CCTTGTCCCC 


ACCGGTGTGC 


CCCATGCAGT 


GACCACTGAT 


ACTAAGTTCA 


GAAACTACCT 


1140 


CATCCCCAAG 


GGCACAACCA 


TAATGGCATT 


ACTGACTTCC 


GTGCTACATG 


ATGACAAAGA 


1200 


ATTTCCTAAT 


CCAAATATCT 


TTGACCCTGG 


CCACTTTCTA 


GATAAGAATG 


GCAACTTTAA 


1260 


GAAAAGTGAC 


TACTTCATGC 


CTTTCTCAGC 


AGGAAAACGA 


ATTTGTG CAG 


GAGAAGGACT 


1320 


TGCCCGCATG 


GAGCTATTTT 


TATTTCTAAC 


CACAATTTTA 


CAGAACTTTA 


ACCTGAAATC 


1380 


TGTTGATGAT 


TTAAAGAACC 


TCAATACTAC 


TGCAGTTACC 


AAAGGGATTG 


TTTCTCTGCC 


1440 
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ACCCTCATAC CAGATCTGCT TCATCCCTGT CTGAAGAATG CTAGCCCATC TGGCTGCTGA 1500 

TCTGCTATCA CCTGCAACTC TTTTTTTATC AAGGACATTC CCACTATTAT GTCTTCTCTG 156 0 

ACCTCTCATC AAATCTTCCC ATTCACTCAA TATCCCATAA GCATCCAAAC TCCATTAAGG 162 0 

AGAGTTGTTC AGGTCACTGC ACAAATATAT CTGCAATTAT TCATACTCTG TAACACTTGT 1680 

ATTAATTGCT GCATATGCTA ATACTTTTCT AATG CTGACT TTTTAATATG TTATCACTGT 174 0 

AAAACACAGA AAAGTGATTA ATGAATGATA ATTTAGTCCA TTTCTTTTGT GAATGTGCTA 1800 

AATAAAAAGT GTTATTAATT GCTGGTTCA 1829 

(2) INFORMATION FOR SEQ ID NO:9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 

Met Asp Ser Leu Val Val Leu Val Leu Cys Leu Ser Cys Leu Leu Leu 
15 10 15 

Leu Ser Leu Trp Arg Gin Ser Ser Gly Arg Gly Lys Leu Pro Pro Gly 
20 25 30 

Pro Thr Pro Leu Pro Val lie Gly Asn lie Leu Gin He Gly He Lys 
35 40 45 

Asp He Ser Lys Ser Leu Thr Asn Leu Ser Lys Val Tyr Gly Pro Val 
50 55 60 

Phe Thr Leu Tyr Phe Gly Leu Lys Pro He Val Val Leu His Gly Tyr 
65 70 75 80 

Glu Ala Val Lys Glu Ala Leu He Asp Leu Gly Glu Glu Phe Ser Gly 
85 90 95 

Arg Gly He Phe Pro Leu Ala Glu Arg Ala Asn Arg Gly Phe Gly He 
100 105 110 

Val Phe Ser Asn Gly Lys Lys Trp Lys Glu He Arg Arg Phe Ser Leu 
115 120 ' 125 

Met Thr Leu Arg Asn Phe Gly Met Gly Lys Arg Ser He Glu Asp Arg 
130 135 140 

Val Gin Glu Glu Ala Arg Cys Leu Val Glu Glu Leu Arg Lys Thr Lys 
145 150 155 160 

Ala Ser Pro Cys Asp Pro Thr Phe He Leu Gly Cys Ala Pro Cys Asn 
165 170 175 

Val He Cys Ser He He Phe His Lys Arg Phe Asp Tyr Lys Asp Gin 
180 185 190 

Gin Phe Leu Asn Leu Met Glu Lys Leu Asn Glu Asn He Lys He Leu 
195 200 205 
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Ser Ser Pro Trp lie Gin lie Cys Asn Asn Phe Ser Pro He He Asp 
210 215 220 

Tyr Phe Pro Gly Thr His Asn Lys Leu Leu Lys Asn Val Ala Phe Met 
225 230 235 240 

Lys Ser Tyr He Leu Glu Lys Val Lys Glu His Gin Glu Ser Met Asp 
245 250 255 

Met Asn Asn Pro Gin Asp Phe He Asp Cys p he Leu Met Lys Met Glu 
260 265 270 

Lys Glu Lys His Asn Gin Pro Ser Glu Phe Thr He Glu Ser Leu Glu 
275 280 2B5 

Asn Thr Ala Val Asp Leu Phe Gly Ala Gly Thr Glu Thr Thr Ser Thr 
290 295 300 

Thr Leu Arg Tyr Ala Leu Leu Leu Leu Leu Lys His Pro Glu Val Thr 
305 310 315 320 

Ala Lys Val Gin Glu Glu He Glu Arg Val He Gly Arg Asn Arg Ser 
325 330 • 335 

Pro Cys Met Gin Asp Arg Ser His Met Pro Tyr Thr Asp Ala Val Val 
340 345 350 

His Glu Val Gin Arg Tyr lie Asp Leu Leu Pro Thr Ser Leu Pro His 
355 360 365 

Ala Val Thr Cys Asp He Lys Phe Arg Asn Tyr Leu He Pro Lys Gly 
370 375 380 

Thr Thr He Leu He Ser Leu Thr Ser Val Leu His Asp Asn Lys Glu 
385 390 395 400 

Phe Pro Asn Pro Glu Met Phe Asp Pro His His Phe Leu Asp Glu Gly 
405 410 415 

Gly Asn Phe Lys Lys Ser Lys Tyr Phe Met Pro Phe Ser Ala Gly Lys 
420 425 430 

Arg He Cys Val Gly Glu Ala Leu Ala Gly Met Glu Leu Phe Leu Phe 
435 440 445 

Leu Thr Ser He Leu Gin Asn Phe Asn Leu Lys Ser Leu Val Asp Pro 
450 455 460 

Lys Asn Leu Asp Thr Thr Pro Val Val Asn Gly Phe Ala Ser Val Pro 
465 470 475 480 

Pro Phe Tyr Gin Leu Cys Phe He Pro Val 
485 490 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1852 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 



GAAGGCTTCA 


ATGGATTCTC 


TTGTGGTCCT 


TGTG CTCTGT 


CTCTCATGTT 


TGCTTCTCCT 


60 


TTCACTCTGG 


AGACAGAGCT 


CTGGGAGAGG 


AAAACTCCCT 


CCTGGCCCCA 


CTCCTCTCCC 


12 0 


AGTGATTGGA 


AATATCCTAC 


AGATAGGTAT 


TAAGGACATC 


AGCAAATCCT 


TAACCAATCT 


180 


CTCAAAGGTC 


TATGGCCCTG 


TGTTCACTCT 


GTATTTTGGC 


CTGAAACCCA 


TAGTGGTGCT 


240 


GCATGGATAT 


GAAGCAGTGA 


AGGAAGCCCT 


GATTGATCTT 


GGAGAGGAGT 


TTTCTGGAAG 


300 


AGGCATTTTC 


CCACTGGCTG 


AAAGAGCTAA 


CAGAGGATTT 


GGAATTGTTT 


TCAGCAATGG 


360 


AAAGAAATGG 


AAGGAGATCC 


GGCGTTTCTC 


CCTCATGACG 


CTGCGGAATT 


TTGGGATGGG 


420 


GAAGAGGAGC 


ATTGAGGACC 


GTGTTCAAGA 


GGAAGCCCGC 


TGCC1TGTGG 


AGGAGTTGAG 


480 


AAAAACCAAG 


GCCTCACCCT 


GTGATCCCAC 


TTTCATCCTG 


GGCTGTGCTC 


CCTGCAATGT 


540 


GATCTGCTCC 


ATTATTTTCC 


ATAAAC G TTT 


TGATTATAAA 


GATCAG CAAT 


TTCTTAACTT 


600 


AATGGAAAAG 


TTGAATGAAA 


ACATCAAGAT 


TTTGAGCAGC 


CCCTGGATCC 


AGATCTG CAA 


660 


TAATTTTTCT 


CCTATCATTG 


ATTACTTCCC 


GGGAACTCAC 


AACAAATTAC 


TTAAAAACGT 


720 


TGCTTTTATG 


AAAAG TTATA 


TTTTGGAAAA 


AGTAAAAGAA 


CACCAAGAAT 


CAATGGACAT 


780 


GAACAACCCT 


CAGGACTTTA 


TTGATTGCTT 


CCTGATGAAA 


ATGGAGAAGG 


AAAAG CA CAA 


840 


CCAACCATCT 


GAATTTA CTA 


TTGAAAGCTT 


GGAAAACACT 


GCAGTTGACT 


TGTTTGGAGC 


900 


TGGGACAGAG 


ACGACAAGCA 


CAACCCTGAG 


ATATG CTCTC 


CTTCTCCTGC 


TGAAGCACCC 


960 


AGAGGTCACA 


GCTAAAGTCC 


AGGAAGAGAT 


TGAACGTGTG 


ATTGGCAGAA 


ACCGGAGCCC 


1020 


CTGCATG CAA 


GACAGGAGCC 


ACATGCCCTA 


CACAGATGCT 


GTGGTGCACG 


AGGTCCAGAG 


1080 


ATACATTGAC 


CTTCTCCCCA 


CCAGCCTGCC 


CCATGCAGTG 


ACCTGTGACA 


TTAAATTCAG 


1140 


AAACTATCTC 


ATTCCCAAGG 


GCACAACCAT 


ATTAATTTCC 


CTGACTTCTG 


TGCTACATGA 


1200 


CAACAAAGAA 


TTTCCCAACC 


CAGAGATGTT 


TGACCCTCAT 


CACTTTCTGG 


ATGAAGGTGG 


1260 


CAATTTTAAG 


AAAAGTAAAT 


ACTTCATGCC 


TTTCTCAGCA 


GGAAAACGGA 


TTTGTGTGGG 


1320 


AGAAGCCCTG 


GCCGGCATGG 


AGCTGTTTTT 


ATTCCTGACC 


TCCATTTTAC 


AGAACTTTAA 


1380 


CCTGAAATCT 


CTGGTTGACC 


CAAAGAACCT 


TGACACCACT 


CCAGTTGTCA 


ATGGATTTGC 


1440 


CTCTGTG CCG 


CCCTTCTACC 


AGCTGTGCTT 


CATTCCTGTC 


TGAAGAAGAG 


CAGATGGCCT 


1500 


GGCTGCTGCT 


GTGCAGTCCC 


TGCAGCTCTC 


TTTCCTCTGG 


GGCATTATCC 


ATCTTTCACT 


1560 


ATCTGTAATG 


CCTTTTCTCA 


CCTGTCATCT 


CACATTTTCC 


CTTCCCTGAA 


GATCTAGTGA 


1620 


ACATTCGACC 


TCCATTACGG 


AGAGTTTCCT 


ATGTTTCACT 


GTGCAAATAT 


ATCTG CTATT 


1680 


CTCCATACTC 


TGTAACAGTT 


GCATTGACTG 


TCACATAATG 


CTCATACTTA 


TCTAATGTTG 


174 0 


AGTTATTAAT 


ATGTTATTAT 


TAAATAGAGA 


AATATGATTT 


GTGTATTATA 


ATTCAAAGGC 


1800 


ATTTCTTTTC 


TGCATGTTCT 


AAATAAAAAG 


CATTATTATT 


TGCTGAAAAA 


AA 


1852 
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(2) INFORMATION FOR SEQ ID NO;ll: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Met Asp Pro Ala Val Ala Leu Val Leu Cys Leu Ser Cys Leu Phe Leu 
15 10 15 

Leu Ser Leu Trp Arg Gin Ser Ser Gly Arg Gly Arg Leu Pro Ser Gly 
20 25 30 

Pro Thr Pro Leu Pro lie He Gly Asn He Leu Gin Leu Asp Val Lys 
35 40 45 

Asp Met Ser Lys Ser Leu Thr Asn Phe Ser Lys Val Tyr Gly Pro Val 
50 55 60 

Phe Thr Val Tyr Phe Gly Leu Lys Pro He Val Val Leu His Gly Tyr 
65 70 75 80 



Glu Ala Val Lys Glu Ala Leu He Asp His Gly Glu Glu Phe Ser Gly 
B5 90 95 

Arg Gly Ser Phe Pro Val Ala Glu Lys Val Asn Lys Gly Leu Gly He 
100 105 110 

Leu Phe Ser Asn Gly Lys Arg Trp Lys Glu He Arg Arg Phe Cys Leu 
115 120 125 

Met Thr Leu Arg Asn Phe Gly Met Gly Lys Arg Ser He Glu Asp Arg 
130 135 140 

Val Gin Glu Glu Ala Arg Cys Leu Val Glu Glu Leu Arg Lys Thr Asn 
145 150 155 160 

Ala Ser Pro Cys Asp Pro Thr Phe He Leu Gly Cys Ala Pro Cys Asn 
165 170 175 

Val He Cys Ser Val He Phe His Asp Arg Phe Asp Tyr Lys Asp Gin 
180 185 190 

Arg Phe Leu Asn Leu Met Glu Lys Phe Asn Glu Asn Leu Arg He Leu 
195 200 205 

Ser Ser Pro Trp He Gin Val Cys Asn Asn Phe Pro Ala Leu He Asp 
210 215 220 

Tyr Leu Pro Gly Ser His Asn Lys He Ala Glu Asn Phe Ala Tyr He 
225 230 235 240 

Lys Ser Tyr Val Leu Glu Arg He Lys Glu His Gin Glu Ser Leu Asp 
245 , 250 255 

Met Asn Ser Ala Arg Asp Phe He Asp Cys Phe Leu He Lys Met Glu 
260 265 270 

Gin Glu Lys His Asn Gin Gin Ser Glu Phe Thr Val Glu Ser Leu He 
275 280 285 
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Ala Thr Val Thr Asp Met Phe Gly Ala Gly Thr Glu Thr Thr Ser Thr 
290 295 300 

Thr Leu Arg Tyr Gly Leu Leu Leu Leu Leu Lys Tyr Pro Glu Val Thr 
305 310 315 320 

Ala Lys Val Gin Glu Glu He Glu Cys Val Val Gly Arg Asn Arg Ser 
325 330 335 

Pro Cys Met Gin Asp Arg Ser His Met Pro Tyr Thr Asp Ala Val Val 
340 345 350 

His Glu He Gin Arg Tyr He Asp Leu Leu Pro Thr Asn Leu Pro His 
355 360 365 

Ala Val Thr Cys Asp Val Lys Phe Lys Asn Tyr Leu He Pro Lys Gly 
370 375 380 

Met Thr He He Thr Ser Leu Thr Ser Val Leu His Asn Asp Lys Glu 
385 390 395 400 

Phe Pro Asn Pro Glu Met Phe Asp Pro Gly His Phe Leu Asp Lys Ser 
405 410 415 

Gly Asn Phe Lys Lys Ser Asp Tyr Phe Met Pro Phe Ser Ala Gly Lys 
420 425 430 

Arg Met Cys Met Gly Glu Gly Leu Ala Arg Met Glu Leu Phe Leu Phe 
435 440 445 

Leu Thr Thr He Leu Gin Asn Phe Asn Leu Lys Ser Gin Val Asp Pro 
450 455 460 

Lys Asp He Asp He Thr Pro He Ala Asn Ala Phe Gly Arg Val Pro 
465 470 475 480 

Pro Leu Tyr Gin Leu Cys Phe He Pro Val 
485 490 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2258 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

AGTGAAAGCC CGCAGTTGTC TTACTAAGAA GAGAAGCCTT CAATGGATCC AGCTGTGGCT 60 

CTGGTGCTCT GTCTCTCCTG TTTGTTTCTC CTTTCACTCT GGAGG CAGAG CTCTGGAAGA 120 

GGGAGGCTCC CGTCTGGCCC CACTCCTCTC CCGATTATTG GAAATATCCT GCAGTTAGAT 18 0 

GTTAAGGACA TGAGCAAATC CTTAACCAAT TTCTCAAAAG TCTATGGCCC TGTGTTCACT 24 0 

GTGTATTTTG GCCTGAAGCC CATTGTGGTG TTGCATGGAT ATGAAGCAGT GAAGGAGGCC 300 

CTGATTGATC ATGGAGAGGA GTTTTCTGGA AGAGGAAGTT TTCCAGTGGC TGAAAAAGTT 360 

AACAAAGGAC TTGGAATCCT TTTCAGCAAT GGAAAGAGAT GGAAGGAGAT CCGGCGTTTC 420 

TGCCTCATGA CTCTGCGGAA TTTTGGGATG GGGAAGAGGA GCATCGAGGA CCGTGTTCAA 48 0 
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GAGGAAGCCC 


GCTGCCTTGT 


GGAGGAGTTG 


AGAAAAACCA 


ATGCCTCACC 


CTGTGATCCC 


540 


ACTTTCATCC 


TGGGCTGTGC 


TCCCTGCAAT 


GTGATCTGCT 


CTGTTATTTT 


CCATGATCGA 


600 


TTTGATTATA 


AAGATCAGAG 


GTTTCTTAAC 


TTGATGGAAA 


AATTCAATGA 


AAACCTCAGG 


660 


ATTCTGAGCT 


CTCCATGGAT 


CCAGGTCTGC 


AATAATTTCC 


CTGCTCTCAT 


CGATTATCTC 


720 


CCAGGAAGTC 


ATAATAAAAT 


AGCTGAAAAT 


TTTG CTTACA 


TTAAAAGTTA 


TGTATTGGAG 


780 


AGAATAAAAG 


AACATCAAGA 


ATCCCTGGAC 


ATGAACAGTG 


CTCGGGACTT 


TATTGATTGT 


840 


TTCCTGATCA 


AAATGGAACA 


GGAAAAGCAC 


AATCAACAGT 


CTGAATTTAC 


TGTTGAAAGC 


900 


TTGATAGCCA 


CTGTAACTGA 


TATGTTTGGG 


GCTGGAACAG 


AGACAACGAG 


CACCACTCTG 


960 


AGATATGGAC 


TCCTGCTCCT 


GCTGAAGTAC 


CCAGAGGTCA 


CAGCTAAAGT 


CCAGGAAGAG 


1020 


ATTGAATGTG 


TAGTTGGCAG 


AAACCGGAGC 


CCCTGTATG C 


AGGACAGGAG 


TCACATGCCC 


1080 


TACACAGATG 


CTGTGGTGCA 


CGAGATCCAG 


AGATACATTG 


ACCTCCTCCC 


CACCAACCTG 


1140 


CCCCATGCAG 


TGACCTGTGA 


TGTTAAATTC 


AAAAACTACC 


TCATCCCCAA 


GGGCATGACC 


1200 


ATAATAACAT 


CCCTGACTTC 


TGTGCTGCAC 


AATGACAAAG 


AATTCCCCAA 


CCCAGAGATG 


1260 


TTTGACCCTG 


GCCACTTTCT 


GGATAAGAGT 


GGCAACTTTA 


AGAAAAGTGA 


CTACTTCATG 


1320 


CCTTTCTCAG 


CAGGAAAACG 


GATGTGTATG 


GGAGAGGGCC 


TGGCCCGCAT 


GGAGCTGTTT 


1380 


TTATTCCTGA 


CCACCATTTT 


GCAGAACTTT 


AACCTGAAAT 


CTCAGGTTGA 


CCCAAAGGAT 


1440 


ATTGACATCA 


CCCCCATTGC 


CAATGCATTT 


GGTCGTGTGC 


CACCCTTGTA 


CCAGCTCTGC 


1500 


TTCATTCCTG 


TCTGAAGAAG 


GGCAGATAGT 


TTGGCTGCTC 


CTGTGCTGTC 


AC CTG CAATT 


1560 


CTCCCTTATC 


AGGGCCATTG 


GCCTCTCCCT 


TCTCTCTATG 


AGGGATATTT 


TCTCTGACTT 


1620 


GTCAATCCAC 


ATCTTCCCAT 


TCCCTCAAGA 


TC CAATGAAC 


ATCCAACCTC 


CATTAAAGAG 


1680 


AGTTTCTTGG 


GTCACTTCCT 


AAATATATCT 


GCTATTCTCC 


ATACT CTGTA 


TCACTTGTAT 


1740 


TGACCACCAC 


ATATGCTAAT 


ACCTATCTAC 


TGCTGAGTTG 


TCAGTATGTT 


ATCACTATAA 


1800 


AACAAAGAAA 


AATGATTAAT 


AAATGACAAT 


TCAGAGCCAT 


TTATTCTCTG 


CATGCTCTAG 


1860 


ATAAAAATGA 


TTATTATTTA 


CTGGGTCAGT 


TCTTAGATTT 


CTTTCTTTTG 


AGTAAAATGA 


1920 


AAGTAAGAAA 


TGAAAGAAAA 


TAGAATGTGA 


AGAGGCTGTG 


CTGGCCCTCA 


TAGTGTTAAG 


1980 


CACAAAAAGG 


GAGAAAGGTA 


AGAGGGTAGG 


AAAGCTGTTT 


TAGCTAAATG 


CCACCTAGAG 


2040 


TTATTGGAGG 


TCTGAATTTG 


GAAAAAAAAA 


CTATGTCCAG 


GAGCAGCTGT 


AACCTGTAGG 


2100 


GAAATAATGG 


AACAATCATC 


CATAAGAGGG 


ATGAACATTA 


AGTGTTTGAA 


TTCATGCTCT 


2160 


GCTTTTGTGT 


TACTGTAAAC 


ACAAGATCAA 


GATTTGGATA 


ATCrrrrrcc 


TTTGTGTTTC 


2220 


CAACTTAGAT 


CATGTCTAAA 


TATATGCTTT 


CATATGGC 






2258 
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(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(iii) HYPOTHETICAL: YES 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Met Asp Pro Xaa Val Val Leu Val Leu Cys Leu Ser Cys Leu Leu Leu 
1 5 10 15 

Leu Ser Leu Trp Arg Gin Ser Ser Gly Arg Gly Lys Leu Pro Pro Gly 
20 25 30 

Pro Thr Pro Leu Pro Xaa He Gly Asn lie Leu Gin lie Asp Xaa Lys 
35 40 45 

Asp He Ser Lys Ser Leu Thr Asn Xaa Ser Lys Val Tyr Gly Pro Val 
50 55 60 

Phe Thr Xaa Tyr Phe Gly Leu Lys Pro He Val Val Leu His Gly Tyr 
65 70 75 80 

Glu Ala Val Lys Glu Ala Leu He Asp Leu Gly Glu Glu Phe Ser Gly 
85 90 95 

Arg Gly Xaa Phe Pro Leu Ala Glu Arg Ala Asn Xaa Gly Xaa Gly He 
100 105 110 

Val Phe Ser Asn Gly Lys Arg Trp Lys Glu He Arg Arg Phe Ser Leu 
115 120 125 

Met Thr Leu Arg Asn Phe Gly Met Gly Lys Arg Ser He Glu Asp Arg 
130 135 140 

Val Gin Glu Glu Ala Arg Cys Leu Val Glu Glu Leu Arg Lys Thr Lys 
145 150 155 160 

Ala Ser Pro Cys Asp Pro Thr Phe He Leu Gly Cys Ala Pro Cys Asn 
165 170 175 

Val He Cys Ser Xaa He Phe His Lys Arg Phe Asp Tyr Lys Asp Gin 
180 185 190 

Gin Phe Leu Asn Leu Met Glu Lys Xaa Asn Glu Asn lie Arg He Leu 
195 200 205 

Ser Ser Pro Trp He Gin Xaa Cys Asn Asn Phe Pro Xaa Xaa He Asp 
210 215 220 

Tyr Phe Pro Gly Thr His Asn Lys Leu Leu Lys Asn Val Ala Phe Met 
225 230 235 240 

Lys Ser Tyr He Leu Glu Lys Val Lys Glu His Gin Glu Ser Xaa Asp 
245 250 255 

Met Asn Asn Pro Arg Asp Phe He Asp Cys Phe Leu He Lys Met Glu 
260 265 270 

Xaa Glu Lys His Asn Gin Gin Ser Glu Phe Thr He Glu Ser Leu Xaa 
275 280 285 
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Xaa Thr Xaa Xaa Asd Leu Phe Gly Ala Gly Thr Glu Thr Thr Ser Thr 
290 295 300 

Thr Leu Arg Tyr Xaa Leu Leu Leu Leu Leu Lys His Pro Glu Val Thr 
305 310 315 320 

Ala Lys Val Gin Glu Glu lie Glu Arg Val He Gly Arg Asn Arg Ser 
325 330 335 

Pro Cys Met Gin Asp Arg Ser His Met Pro Tyr Thr Asp Ala Val Val 
340 345 350 

His Glu Xaa Gin Arg Tyr He Asp Leu Leu Pro Thr Ser Leu Pro His 
355 360 365 

Ala Val Thr Cys Asp Val Lys Phe Arg Asn Tyr Leu He Pro Lys Gly 
370 375 380 

Thr Thr He Leu Thr Ser Leu Thr Ser Val Leu His Asp Xaa Lys Glu 
385 390 395 400 

Phe Pro Asn Pro Glu Met Phe Asp Pro Gly His Phe Leu Asp Xaa Gly 
405 410 415 

Gly Asn Phe Lys Lys Ser Asp Tyr Phe Met Pro Phe Ser Ala Gly Lys 
420 425 430 

Arg He Cys Val Gly Glu Gly Leu Ala Arg Met Glu Leu Phe Leu Phe 
435 440 445 

Leu Thr Thr He Leu Gin Asn Phe Asn Leu Lys Ser Leu Val Asp Pro 
450 455 460 

Lys Xaa Leu Asp Thr Thr Pro Val Val Asn Gly Phe Ala Ser Val Pro 
465 470 475 480 

Pro Phe Tyr Gin Leu Cys Phe He Pro Val 
485 490 



(2) INFORMATION FOR SEQ ID NO:14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1892 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: YES 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

AGTGAAAGCC CGCAGTTGTC TTACTAAGAA GAGAAGNCTT CAATGGATCC TNTTGTGGTC 60 

CTNGTGCTCT GTCTCTCATG TTTGCTTCTC CTTTCACTCT GGAGACAGAG CTCTGGGAGA 120 

GGNAANCTCC CTCCTGGCCC CACTCCTCTC CCANTNATTG GAAATATCCT ACAGATAGAT 180 

NTTAAGGACA TCAG CAAATC CTTAACCAAT NTCTCAAAAG TCTATGGCCC TGTGTTCACT 240 

NTGTATTTTG GCCTGAAACC CATAGTGGTG NTGCATGGAT ATGAAGCAGT GAAGGAAGCC 300 

CTGATTGATC NTGGAGAGGA GTTTTCTGGA AGAGGCANTT TCCCACTGGC TGAAAGAGNT 360 

AACANAGGAN TTGGAATCGT TTTCAGCAAT GGAAAGAGAT GGAAGGAGAT CCGGCGTTTC 420 
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TC C CTCATGA 


CG CTGLGGAA 


TTTTGGGATG 


GGGAAGAGGA 


GCATTGAGGA 






GAGGAAGCCC 


G CTG C CTTGT 


LjoALjIoAoI lb 


AGAAAAACCA AGGCCTCACC 


PTPTP A TCCH 


£T yi 


ALI 1 l CA7 LL 




ILLLIbUAAi 


GTGATCTGCT 


CCNTTATTTT 


PPATAAAPPM 


OUU 


ri ■ ■ ■ 1 y"t » T" I • 7V *T , 7\ 

I 1 iuAi x>\i>i 


a apatpzupmiv 




TTGATGGAAA 


AATTNAATGA 


AAAPATPAOP. 


DOU 




PP P PNTTf? n A T 




AATAATTTNC 


CTCCTNTCAT 


TGATTATTTC 


720 


CPNGGAAPTP 


ANAAPAAATT 


ACTTAAAAAN 


GTTG CTTTTA 


TGAAAAGTTA 


TATTTTGGAG 


780 


AAAGTAAAAG 


AAPAPPAAP,A 


ATPANTGGAP 


ATGAACAANC 


CTCGGGACTT 


TATTGATTG P 


840 


TTPPTGATPA 


AA ATGG AGN A 


OGAAAAGPAP 


AACCAACAGT 


CTGAATTTAC 


TATTGAAAG C 


9 00 


TTOG TANN PA 


PTGNA G PTG A 


MTTHTTTG G A 


GCTGGNACAG 


AGACAACAAG 


PACNAPNCTG 


960 


fiPATJiT'P'KrMP 






CCAGAGGTCA 


CAGCTAAAGT 


Lnu LzAft uHu 




Ai xtaAAt_ollj 


1 AA 1 1 bbLAb 


AAALubGAGL 


CCCTGCATGC 


AGGACAGGAG. 


LUALAlbLLL 


i A q n 


TACACAGATG 


CTGTGGTGCA 


CGAGNTCCAG 


AGATACATTG 


ACCTNCTCCC 


CACCAGCCTG 


1140 


CCCCATGCAG 


TGACCTGTGA 


NNTTAAATTC 


AGAAACTACC 


TCATNCCCAA 


GGGCACAACC 


1200 


ATANTAACNT 


CCCTGACTTC 


TGTGCTACAT 


GANNACAAAG 


AATTTCCCAA 


CCCAGAGATG 


1260 


TTTGACCCTN 


GNCACTTTCT 


GGATNANNGT 


GGCAANTTTA 


AGAAAAGTNA 


CTACTTCATG 


1320 


CCTTTCTCAG 


CAGGAAAACG 


GATTTGTGTG 


GGAGANGGCC 


TGGCCCGCAT 


GGAGCTGTTT 


1380 


TTATT C CTGA 


CCNCCATTTT 


ACAGAACTTT 


AACCTGAAAT 


CTCTGGTTGA 


CCCAAANGAC 


144 0 


CTTGACACCA 


CTCCAGTTGN 


CAATGGATTT 


GCTTCTGTGC 


CNCCCTTCTA 


CCAGCTNTGC 


1500 


TTCATTCCTG 


TCTGAAGAAG 


GGCAGATGGT 


CTGGCTGCTN 


CTGTGCTGTC 


NCNNNNNNTN 


1560 


NNTTTNNTCT 


GGGGCAATTT 


CCNTCTTNCA 


TNNNTNTTNN 


TGCNNTTTNT 


CATCTGNCAT 


1620 


CTCACANTNC 


NNCTTCCCTT 


AN CATCNAGN 


NACCATTNAN 


NNNCAATNTC 


CAAGAGNGTG 


1680 


NNTTTNTTNN 


CTNTCCACCT 


ANATCTATCN 


NTNNNNCTNC 


TNTNTNTNNA 


TNACTTTGAT 


1740 


TGTCCNCTAN 


TGATGNTAAT 


TNTTTAATAT 


TGNNTTATTG 


NNANNNTNTT 


ATNANTNANA 


1800 


AANAAATGAT 


AATTNTNTNN 


AAATNNNAAG 


TCANTGCNNT 


TNANNATNTN 


CNNAATAAAA 


1860 


AGCATTATTA 


TTTGCTGAAA 


AAAAGTCAGT 


TC 






1892 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



Ui) SEQUENCE DESCRIPTION; SEQ ID NO: 15: 
GCAAGCTTAA AAAATGGATC CAGCTGTGGC TCT 33 
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(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:16: 
GCAAGCTTGC CAAACTATCT GCCCTTCT 28 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
ACTTTTCAAT GTAAGCAAAT 20 



(2) INFORMATION FOR SEQ ID NO:18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: 
TTAGTAATTC TTTGAGATAT 20 



(2) INFORMATION FOR SEQ ID NO : 19 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
CTGTTAGCTC TTTCAGCCAG 20 
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(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
GGAGCACAGC CCAGGATGAA 2 0 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
GCAAGCTTAA AAAATGGATC CAGCTGTGGC TCT 33 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 
GCAAGCTTGC CAAACTATCT GCCCTTCT 28 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
TGGCCCTGAT AAGGGAGAAT 20 
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(2) INFORMATION FOR SEQ ID NO:24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

( C ) STRAND EDNESS : s i ng 1 e 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:24: 
ATCCAGAGAT A CATTGACCT C 21 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 
CCATGAAGTG ACCTGTGATG 20 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
AAAGATGGAT AATGCCCCAG 20 



(2) INFORMATION FOR SEQ ID NO:27: 

Ei) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
GAAGGAGATC CGGCGTTTCT 20 
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(2) INFORMATION FOR SEQ ID NO: 28: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : s ing 1 e 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:28: 
GGCGTTTCTC CCTCATGACG 20 

(2) INFORMATION FOR SEQ ID NO;29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 29 : 
TTGTCATTGT GCAG 14 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
CACATGCCCT ACACA 15 

(2) INFORMATION FOR SEQ ID NO:31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3l: 
TGACGCTGCG GAATT 15 
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(2) INFORMATION FOR SEQ ID NO:32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
GGACTTTATT GATTG 



(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 
ATGATT CTCT TGTGGTCCT 



(2) INFORMATION FOR SEQ ID NO:34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 
AAAGATGGAT AATGCCCCCA G 
(2) INFORMATION FOR SEQ ID NO: 35: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:35: 
GCAAGCTTAA AAAAATGGAA CCTTTTGTGG TCCT 



9530766A1 I > 



WO 95/30766 



PCTyTfS95/05744 



116 

(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
GCAAGCTTGC CAGATGGGCT AGCATTCT 28 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) ~ TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi} SEQUENCE DESCRIPTION: SEQ ID NO:37: 
GCAAGCTTAA AAAAATGGAT TCTCTTGTGG TCCT 34 
(2) INFORMATION FOR SEQ ID NO : 3 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: 
GCAAGCTTGC CAGGCCATCT GCTCTTCT 28 



(2) INFORMATION FOR SEQ ID NO:39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE; DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: 
GCAAGCTTAA AAAAATGGAT TCTCTTGTGG TCCT 34 
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(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:40: 
GCAAGCTTGC CAGACCATCT GTGCTTCT 28 



(2) INFORMATION FOR SEQ ID NO: 41: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS; single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligo) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: 
AGCTTAAAAA AATG 14 

(2) INFORMATION FOR SEQ ID NO:42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligo) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
GATCCATTTT TTTA 14 

(2) INFORMATION FOR SEQ ID NO:43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY : unknown 

(ii) MOLECULE TYPE: peptide 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 

Cys He Asp Tyr Leu Pro Gly Ser His Asn Lys He Ala Glu Asn Phe 
15 10 15 

Ala 

(2) INFORMATION FOR SEQ ID NO: 44; 

ti) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 

Cys Leu Ala Phe Met Glu Ser Asp He Leu Glu Lys Val Lys 
15 10 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 284 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME /KEY : CDS 
<B) LOCATION: 2.. 283 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO; 45: 

A TTG AAT GAA AAC ATC AGG ATT GTA AGC ACC CCC TGG ATC CAG ATA 46 
Leu Asn Glu Asn He Arg He Val Ser Thr Pro Trp He Gin He 
1 5 10 15 

TGC AAT AAT TTT CCC ACT ATC ATT GAT TAT TTC CCG GGA ACC CAT AAC 94 
Cys Asn Asn Phe Pro Thr He He Asp Tyr Phe Pro Gly Thr His Asn 

20 25 v, 30 

AAA TTA CTT AAA AAC CTT GCT TTT ATG GAA AGT GAT ATT TTG GAG AAA 142 
Lys Leu Leu Lys Asn Leu Ala Phe Met Glu Ser Asp He Leu Glu Lys 
35 40 45 

GTA AAA GAA CAC CAA GAA TCG ATG GAC ATC AAC AAC CCT CGG GAC TTT 190 
Val Lys Glu His Gin Glu Ser Met Asp He Asn Asn Pro Arg Asp Phe 
50 55 60 

ATT GAT TGC TTC CTG ATC AAA ATG GAG AAG GAA AAG CAA AAC CAA CAG 238 
He Asp Cys Phe Leu He Lys Met Glu Lys Glu Lys Gin Asn- Gin Gin 
65 70 75 

TCT GAA TTC ACT ATT GAA AAC TTG GTA ATC ACT GCA GCT GAC TTA 283 
Ser Glu Phe Thr He Glu Asn Leu Val He Thr Ala Ala Asp Leu 
80 85 90 

C 284 
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(2) INFORMATION FOR SEQ ID NO:46: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 94 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO:46: 

Leu Asn Glu Asn lie Arg lie Val Ser Thr Pro Trp He Gin He Cys 
1 5 10 15 . 

Asn Asn Phe Pro Thr He He Asp Tyx Phe Pro Gly Thr His Asn Lys 
20 25 30 

Leu Leu Lys Asn Leu Ala Phe Met Glu Ser Asp He Leu Glu Lys Val 
35 40 45 

Lys Glu His Gin Glu Ser Met Asp He Asn Asn Pro Arg Asp Phe He 
50 55 60 

Asp Cys Phe Leu He Lys Met Glu Lys Glu Lys Gin Asn Gin Gin Ser 
65 70 75 80 

Glu Phe Thr He Glu Asn Leu Val He Thr Ala Ala Asp Leu 
85 90 

(2) INFORMATION FOR SEQ ID NO: 47: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 244 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 44.. 103 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:47: 

ATTGAATGAA AACATCAGGA TTGTAAG CAC CCCCTGGATC CAG GAA CCC ATA ACA 55 

Glu Pro He Thr 
1 

AAT TAG TTA AAA ACC TTG CTT TTA TGG AAA GTG ATA TTT TGG AGA AAG 103 
Asn Tyr Leu Lys Thr Leu Leu Leu Trp Lys Val He Phe Trp Arg Lys 
5 10 15 20 

TAAAAGAACA CCAAGAATCG ATGGACATCA ACAACCCTCG GGACTTTATT GATTG CTTCC 163 

TGATCAAAAT GGAGAAGGAA AAGCAAAACC AACAGTCTGA ATT CA CT ATT GAAAACTTGG 223 

TAATCACTGC AGCTGACTTA C 244 
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(2) INFORMATION FOR SEQ ID NO : 4 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 

Glu Pro lie Thr Asn Tyr Leu Lys Thr Leu Leu Leu Trp Lys Val lie 
15 10 15 

Phe Trp Arg Lys 
20 

(2) INFORMATION FOR SEQ ID NO:49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 83 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 1..32 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 33.. 83 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:49: 
TTTTAATTTA ATAAATTATT GTTTTCTCTT AGATATG CAA TAATTTTCCC ACTATCATTG 6 0 

ATTATTTCCC GGGAACCCAT AAC 83 

(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 83 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 1. .72 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 73.. 83 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
TTTTAATTTA ATAAATTATT GTTTTCTCTT AGATATG CAA TAATTTTCCC ACTATCATTG 60 
ATTATTTCCA AGGAACCCAT AAC 83 
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(2) INFORMATION FOR SEQ ID NO: 51; 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 826 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) . 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:51: 



ATGGTGATGT 


AGNAANTCAT 


NCCATCTTAT 


ATTTCNAGAG 


TGTAGAGGAG 


GATTGTTGNG 


60 


GAAGTAAGAG 


GNNTAAGATA 


GAGATGCNTT 


TATACTATCC 


CAAGCAGGGA 


TRAGTCTAGG 


12 0 


AAATGATTAT 


CGTCTTTGAT 


TCTCTTGTCA 


GRATTTTCTT 


TCTCMNATCT 


TGTATAATCA 


180 


GAGAATTACT 


ACACATGGAC 


AATRAARATT 


TCCCCNTCCA 


GATANACAAT 


ATATTTTATT 


24 0 


TATATTTATA 


GTTTTAAATT 


ACAACCAGAG 


CTTGGCATAT 


TGTATCTATA 


CCTTTAATAA 


300 


ATG CTTTTAA 


TTTAATAAAT 


TATTGTTTTC 


TCTTAGATAT 


GCAATAATTT 


TCCCACTATC 


360 


ATTGATTATT 


TCCCGGGAAC 


CCATAACAAA 


TTACTTAAAA 


ACCTTGCTTT 


TATGGAAAGT 


420 


GATATTTTGG 


AGAAAGTAAA 


AGAACACCAA 


GAATCGATGG 


ACATCAACAA 


CCCTCGGGAC 


480 


TTTATTGATT 


GCTTCCTGAT 


CAAAATGGAG 


AAGGTAAAAT 


GTTAACAAAA 


GCTTAGTTAT 


540 


GTGACTGCTT 


GCGTATKTGT 


GATTCATTGA 


CTAGTTGKGT 


GTTTACTACG 


GATGTTTAAC 


600 


AGGTCAAGGA 


GTAATGCTTG 


AGAAGCATAT 


TrAAG'lTTTT 


ATTGTATGCA 


TGAATATCCA 


660 


GTAAG CATCA 


TAGAAAATGT 


AAAATTAANT 


TGTTAAATAA 


TTAGAATACA 


TAGAAGAAAT 


720 


TGTTTAGATA 


AATATNATCT 


ATCTGAACAA 


TAAGGATGTC 


AGGATAGGAA 


AAG CTCTGTT 


780 


TCTGCAGCTT 


CCAGTGGAGA 


TCAGCACAGG 


AGGGAACTTA 


TTTTTT 




826 



(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 655 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY; CDS 

(B) LOCATION: 263 . .421 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 

AGGGAAAAGA CAAATAGGCC GGGGATGNAA ATTTAG CATG TGAGCAACCT TANTTAACCA 60 

GCTAGGCTGT AATTGNTAAT TCGAGANTAA TGTNAAAGTG ATGTGTTGAT TTTATG CATG 120 

CCNNACTCNT TTTTGCTTTT AAGGGGAGTC ATAGGTAAGA TATTACTTAA AATTTCTAAA 18 0 
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CTATTATTAT CTGTTAACTA ATATGAAGTG TTTTATATCT AATGTTT A CT CATATTTTAA 24 0 

AATTGTTTCC AATCATTTAG CT TCA CCC TGT GAT CCC ACT TTC ATC CTG GGC 292 

Ser Pro Cys Asp Pro Thr Phe lie Leu Gly 
15 10 

TGT GCT CCC TGC AAT GTG ATC TGC TCC ATT ATT TTC CAG AAA CGT TTC 34 0 

Cys Ala Pro Cys Asn Val lie Cys Ser He He Phe Gin Lys Arg Phe 
15 20 25 

GAT TAT AAA GAT CAG CAA TTT CTT AAC TTG ATG GAA AAA TTG AAT GAA 388 
Asp Tyr Lys Asp Gin Gin Phe Leu Asn Leu Met Glu Lys Leu Asn Glu 
30 35 40 

AAC ATC AGG ATT GTA AGC ACC CCC TGG ATC CAG GTAAGGACA AGTTTTGTGC 44 0 

Asn He Arg He Val Ser Thr Pro Trp He Gin . 
45 50 

TTCCTGAGAA ACCACTTACA GTCTTTTTTT CTGGGAAATC CAAAATTCTA TATTGACCAA 500 

GCCCTGAAGT A CATTTGTGA ATACTA CAGT CTTGCCTAGA CAGCCATGGG GTGAATATCT 56 0 

GGAAAAGATG GCAAAGNTCT TTATTTTATG CACAGGAAAT GAATATCCCA ATATAGATCA 62 0 

GGCTTCTAAG CCCATTAGCT CCCTGATCAG TGTTT 655 

(2) INFORMATION FOR SEQ ID NO:53: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 53 amino acids 

(B) TYPE: amino acid 
<D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 

Ser Pro Cys Asp Pro Thr Phe He Leu Gly Cys Ala Pro Cys Asn Val 
15 10 15 

He Cys Ser He He Phe Gin Lys Arg Phe Asp Tyr Lys Asp Gin Gin 
20 25 30 

Phe Leu Asn Leu Met Glu Lys Leu Asn Glu Asn He Arg He Val Ser 
35 40 45 

Thr Pro Trp He Gin 
50 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 292 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:54: 
ATGAAGTGTT TTATATCTAA TGTTTACTCA TATTTTAAAA TTG TTT C CAA TCATTTAG CT 60 
TCACCCTGTG ATCCCACTTT CATCCTGGGC TGTGCTCCCT GCAATGTGAT CTGCTCCATT 120 
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ATTTTCCAGA AACGTTTCGA TTATAAAGAT CAGCAATTTC TTAACTTGAT GGAAAAATTG 



180 



AATGAAAACA TCAGGATTGT AAGCACCCCC TGAATCCAGG TAAGGACAAG TTTTGTG CTT 



240 



CCTGAGAAAC CACTTACAGT 



*CT GGGAAATCCA AAATTCTATA TT 



292 



(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
AATTACAACC AGAGCTTGGC 20 



(2) INFORMATION FOR SEQ ID NO:56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:56: 
TATCACTTTC CATAAAAGCA AG 22 



(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:57: 
TATTATCTGT TAACTAACTA ATATGA 26 



(2) INFORMATION FOR SEQ ID NO:58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D> TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:58: 
ACTTCAGGGC TTGGTCAATA 2 0 

(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:59: 
ATTGAATGAA AACATCAGGA TTG 23 

(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

Cii) MOLECULE TYPE: DNA (primer) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
GTAAGTCAGC TGCAGTGATT A 21 

(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 826 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:61: 

ATGGTGATGT AGNAANTCAT NCCATCTTAT ATTTCNAGAG TGTAGAGGAG GATTGTTGNG 60 

GAAGTAAGAG GNNTAAGATA GAGATGCNTT TATACTATCC CAAGCAGGGA TRAGTCTAGG 120 

AAATGATTAT CGTCTTTGAT TCTCTTGTCA GRATTTTCTT TCT CMNATCT TGTATAATCA 180 

GAGAATTACT ACACATGGAC AATRAARATT TCCCCNTCCA GATANACAAT ATATTTTATT 24 0 

TATATTTATA GTTTTAAATT ACAACCAGAG CTTGGCATAT TGTATCTATA CCTTTAATAA 300 

ATGCTTTTAA TTTAATAAAT TATTGTTTTC TCTTAGATAT GCAATAATTT TCCCACTATC 360 

ATTGATTATT TCCCAGGAAC CCATAACAAA TTACTTAAAA ACCTTGCTTT TATGGAAAGT 420 

GATATTTTGG AGAAAGTAAA AGAACACCAA GAATCGATGG ACATCAACAA CCCTCGGGAC 4 80 
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TTTATTGATT GCTTCCTGAT CAAAATGGAG AAGGTAAAAT GTTAACAAAA G CTTAGTTAT 54 0 

GTGACTGCTT GCGTATKTGT GATTCATTGA CTAGTTG KGT GTTTACTACG GATGTTTAAC 600 

AGGTCAAGGA GTAATG CTTG AGAAG CATAT TTAAGTTTTT ATTGTATGCA TGAATATCCA 66 0 

GTAAGCATCA TAGAAAATGT AAAATTAANT TGTTAAATAA TTAGAATACA TAGAAGAAAT 720 

TGTTTAGATA AATATNATCT ATCTGAACAA TAAGGATGTC AGGATAGGAA AAGCTCTGTT 780 

TCTGCAGCTT C CAGTG GAGA TCAGCACAGG AGGGAACTTA TTTTTT 826 
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WHAT IS CLAIMED IS: 

1 l. A purified cytochrome P450 2C19 polypeptide 

2 comprising an amino acid sequence having at least 97% sequence 

3 identity with the amino acid sequence designated SEQ. ID. 

4 No. l. 

1 2 - A purified DNA segment encoding the purified 

2 polypeptide of claim 1. 

1 3. A stable cell line comprising an exogenous DNA 

2 segment encoding a cytochrome P450 2C19 polypeptide of 

3 claim 1, the DNA segment capable of being expressed in the 

4 cell line. 

1 4 . A method of screening for a drug that is 

2 metabolized by S-mephenytoin 4 1 hydroxylase activity, the 

3 method comprising the steps of : 

4 contacting the drug with a cytochrome P4 50 2C19 

5 polypeptide of claim 1; and 

6 detecting a metabolic product resulting from an 

7 interaction between the drug and the polypeptide, the presence 

8 of the product indicating the drug is metabolized by the S- 

9 mephenytoin 4 1 -hydroxylase activity. 

1 5 . A method of diagnosing a patient having a 

2 deficiency in S-mephenytoin 4 • -hydroxylase activity, the 

3 method comprising: 

4 obtaining a sample of nucleic acids from the 

5 patient; and 

6 analyzing a cytochrome P450 2C19 DNA sequence 

7 from the nucleic acids in the sample for the presence of a 

8 polymorphism indicative of the deficiency, 

1 6. The method of claim 5, further comprising the 

2 step of amplifying the cytochrome P450 2C19 DNA sequence. 

1 7. The method of claim 6, wherein the P450 2C19 

2 DNA sequence is genomic. 
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1 8. The method of claim 7, wherein the amplifying 

2 step is primed from a forward primer sufficiently 

3 complementary with a first subsequence of the antisense strand 

4 of the 2C19 sequence to hybridize therewith, and a reverse 

5 primer sufficiently complementary to a second subsequence of 

6 the sense strand of the 2C19 sequence to hybridize therewith. 

1 9 . The method of claim 8 , wherein the polymorphism 

2 occurs at nucleotide 681 of the coding region of the P450 2C19 

3 DNA genomic sequence. 

1 10. The method of claim 9, wherein the first 

2 subsequence of the sense strand is upstream from nucleotide 

3 681 of the coding region, and the second subsequence of the 

4 antisense strand is downstream of nucleotide 681 of the coding 

5 region. 

1 11. The method of claim 10, wherein the analyzing 

2 step comprises digesting the amplified DNA segment with a 

3 restriction enzyme that recognizes a site including nucleotide 

4 68 1 of the coding region . 

1 12. The method of claim 8, wherein the polymorphism 

2 occurs at nucleotide 63 6 of the coding region of the P450 2C19 

3 DNA genomic sequence. 

1 13. The method of claim 12, wherein the first 

2 subsequence of the sense strand is upstream from nucleotide 

3 63 6 of the coding region, and the second subsequence of the 

4 antisense strand is downstream of nucleotide 636 of the coding 

5 region. 

1 14. The method of claim 13, wherein the analyzing 

2 step comprises digesting the amplified DNA segment with a 

3 restriction enzyme that recognizes a site including nucleotide 

4 63 6 of the coding region. 
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15. The method of claim 8, wherein the polymorphism 
occurs at nucleotide 636 or 681 of the coding region of the 

P4 50 2C19 DNA genomic sequence, wherein the first subsequence 
of the sense strand is upstream from nucleotide 636 of the 
coding region, and the second subsequence of the antisense 
strand is downstream of nucleotide 681 of the coding region, 

16. The method of claim 9, wherein the forward 

primer has 

about 10-50 contiguous nucleotides from the 
wildtype 2C19 sequence (SEQ. ID. No. 51) shown in Fig. 16 
including the nucleotide at position 681 of the coding region; 

wherein the forward primer primes amplification 
from the complement of the wildtype 2C19 sequence without 
priming amplification from the complement of the mutant 2C19 
sequence shown in Fig. 16 (SEQ. ID. No. 61). 

17. The method of claim 16, wherein the 3' 
nucleotide of the forward primer is the nucleotide at position 
681. 

18. The method of claim 9, wherein the reverse 

primer has 

about 10-50 contiguous nucleotides from the 
complement of the wildtype 2C19 sequence (SEQ. ID. No. 51) 
shown in Fig. 16 including the complement to nucleotide 681 of 
the coding region; 

wherein the reverse primer primes amplification 
from the wildtype 2C19 sequence without priming amplification 
from the mutant 2C19 sequence (SEQ. ID. No. 61) shown in 
Fig. 16. 

19. The method of claim 18, wherein the 3* 
nucleotide of the reverse primer is the complement of the 
nucleotide at position 681. 
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1 20. The method of claim 9, wherein the forward 

2 primer has 

3 about 10-50 contiguous nucleotides from the 

4 mutant 2C19 sequence shown in Fig. 16 including the nucleotide 

5 at position 681 of the coding sequence, 

6 wherein the forward primer primes amplification 

7 from the complement of the mutant 2C19 sequence (SEQ. ID. 

8 No. 61) without priming amplification from the complement of 

9 the wildtype 2C19 (SEQ. ID* No. 51) sequence shown in Fig. 16. 

1 21. The method of claim 20, wherein the 3 1 

2 nucleotide of the forward primer is the nucleotide at 

3 position 681. 

1 22. The method of claim 9, wherein the reverse 

2 primer has 

3 about 10-50 contiguous nucleotides from the 

4 complement of the mutant 2C19 sequence (SEQ. ID. No. 61) shown 

5 in Fig. 16 including the complement to nucleotide 681 of the 

6 coding region; 

7 wherein the reverse primer primes amplification 

8 from the mutant 2C19 sequence without priming amplification 

9 from the wildtype 2C19 (SEQ. ID. No. 51) sequence shown in 
10 Fig. 16. 

1 23. The method of claim 22, wherein the 3' 

2 nucleotide of the reverse primer is the complement of the 

3 nucleotide at position 681. 

1 24. The method of claim 12, wherein the forward 

2 primer has 

3 about 10-50 contiguous nucleotides from the 

4 wildtype 2C19 sequence (SEQ. ID. No. 52) shown in Fig. 17 

5 including the nucleotide at position 63 6 of the coding region; 

6 wherein the forward primer primes amplification 

7 from the complement of the wildtype 2C19 sequence (SEQ. ID. 

8 No. 54) without priming amplification from the complement of 

9 the mutant 2C19 sequence shown in Fig. 17. 
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1 25. The method of claim 12, wherein the reverse 

2 primer has 

3 about 10-50 contiguous nucleotides from the 

4 complement of the wildtype 2C19 sequence (SEQ. ID. No. 52) 

5 shown in Fig. 17 including the complement to nucleotide 63 6 of 

6 the coding region; 

7 wherein the reverse primer primes amplification 

8 from the wildtype 2C19 sequence without priming amplification 

9 from the mutant 2C19 sequence (SEQ. ID. No. 54) shown in 
10 Fig. 17. 

1 26. The method of claim 12, wherein the forward 

2 primer has 

3 about 10-50 contiguous nucleotides from the 

4 mutant 2C19 sequence (SEQ. ID. No. 54) shown in Fig. 17 

5 including the nucleotide at position 63 6 of the coding 

6 sequence, 

7 wherein the forward primer primes amplification 

8 from the complement of the mutant 2C19 sequence without 

9 priming amplification from the complement of the wildtype 2C19 
10 sequence (SEQ- ID. No. 52) shown in Fig 17. 

1 27 . The method of claim 12 , wherein the reverse 

2 primer has 

3 about 10-50 contiguous nucleotides from the 

4 complement of the mutant 2C19 sequence (SEQ. ID, No. 54) shown 

5 iri Fig. 17 including the complement to nucleotide 63 6 of the 

6 coding region; 

7 wherein the reverse primer primes amplification 

8 from the mutant 2C19 sequence without priming amplification 

9 from the wildtype 2C19 sequence (SEQ. ID* No. 52) shown in 
10 Fig. 17. 

1 28. The method of claim 6, wherein the segment of 

2 the 2C19 sequence to be amplified is a cDNA sequence, and the 

3 method further comprises the step of reverse transcribing mRNA 

4 in the sample to produce the cDNA sequence. 

BNSDOC1D: <WO 9530766At_l_> 



WO 95/30766 



PCT/US95/05744 



131 

1 29. The method of claim 28, wherein the forward 

2 primer comprises about 10-50 contiguous nucleotides upstream 

3 of nucleotide 643 of the coding region of the wildtype 2C19 

4 cDNA sequence (SEQ. ID, No. 49) shown in Fig. 12 and 

5 hybridizes to the complement of the 2C19 sequence upstream 

6 from nucleotide 64 3 of the coding region, and the reverse 

7 primer comprises about 10-50 contiguous nucleotides from the 

8 complement of the wildtype 2C19 cDNA sequence (SEQ. ID No. 49) 

9 shown in Fig. 12 and hybridizes to the 2C19 sequence 
10 downstream from nucleotide 682 of the coding region. 

1 30. The method of claim 28, wherein the forward 

2 primer hybridizes to the complement of the wildtype 2C19 cDNA 

3 sequence (SEQ. ID. No. 49) shown in Fig. 12 between 

4 nucleotides 643 and 682 without hybridizing to the complement 

5 of the mutant 2C19 cDNA sequence (SEQ. ID. No. 50) shown in 

6 Fig. 12. 

1 31. The method of claim 30, wherein the reverse 

2 primer hybridizes to the wildtype 2C19 cDNA sequence (SEQ. ID. 

3 No. 49) shown in Fig. 12 between nucleotides 643 and 682 

4 without hybridizing to the mutant 2C19 cDNA sequence (SEQ. ID. 

5 No. 50) shown in Fig. 12. 

1 32. The method of claim 28, wherein the forward 

2 primer comprises about 10-50 contiguous nucleotides upstream 

3 of nucleotide 63 6 of the coding region of the wildtype 2C19 

4 cDNA sequence (SEQ. ID. No. 49) shown in Fig. 12, and the 

5 reverse primer comprises about 10-50 contiguous nucleotides 

6 from the complement of the wildtype 2C19 cDNA sequence (SEQ. 

7 - ID. No. 49) shown in Fig. 12 downstream from nucleotide 636 of 

8 the coding region. 

1 33. The method of claim 28, wherein the full-length 

2 2C19 cDNA sequence is amplified. 

1 34. The method of claim 33, further comprising the 

2 step of sequencing a segment of the 2C19 cDNA sequence. 



BNSOOCID: < WO 95307 66A 1 J _ > 



WO 95/30766 



PCT/US95/05744 



132 

1 35, The method of claim 5 further comprising the 

2 step of : 

3 digesting the DNA with a restriction enzyme 

4 that recognizes a site including nucleotide 636 or 681 of the 

5 2C19 DNA sequence ; 

6 wherein: 

7 the 2C19 DNA sequence is genomic; and 

8 the analyzing step comprises detecting the 

9 products resulting from the digestion by Southern blotting 

1 with a labelled segment of the 2C19 DNA sequence as a probe* 

1 36. A diagnostic kit comprising: 

2 a forward primer sufficiently complementary 

3 with a first subsequence of the antisense strand of a double- 

4 stranded 2C19 genomic DNA sequence to hybridize therewith, and 

5 a reverse primer sufficiently complementary with a second 

6 subsequence of the sense strand of the 2C19 genomic sequence 

7 to hybridize therewith; 

8 wherein the first subsequence is upstream of 

9 nucleotide 681 of the coding region, and second subsequence is 
10 downstream of nucleotide 681 of the coding region. 

1 37. The diagnostic kit of claim 36, wherein the 

2 first subsequence is upstream from nucleotide 63 6 of the 

3 coding region. 

1 38. The diagnostic kit of claim 36, wherein the 

2 forward primer has about 10-50 contiguous nucleotides from the 

3 wildtype 2C19 sequence (SEQ. ID. No. 51) shown in Fig. 16, and 

4 the reverse primer has about 10-50 contiguous nucleotides from 

5 the complement of the wildtype 2 CI 9 sequence (SEQ. ID. No. 51) 

6 shown in Fig. 16. 
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1 39. The diagnostic kit of claim 38, further 

2 comprising 

3 a second forward primer sufficiently 

4 complementary with a first subsequence of the antisense strand 

5 of a double-stranded 2C19 genomic DNA sequence to hybridize 

6 therewith, and a a second reverse primer sufficiently 

7 complementary with a second subsequence of the sense strand of 

8 the 2C19 genomic sequence to hybridize therewith; 

9 wherein the first subsequence is upstream of 

10 nucleotide 636 of the coding region, and second subsequence is 

11 downstream of nucleotide 636 of the coding region, 

1 40, The diagnostic kit of claim 39, further 

2 comprising a restriction enzyme that recognizes a site that 

3 includes nucleotide 681 or nucleotide 63 6 of the coding 

4 region. 

1 41. A primer selected from the group consisting of: 

2 (a) a first forward primer having: 

3 about 10-50 contiguous nucleotides from 

4 the wildtype 2C19 sequence (SEQ. ID. No. 51) shown in Fig. 16 

5 including the nucleotide at position 681 of the coding region; 

6 wherein the first forward primer primes 

7 amplification from the complement of the wildtype 2C19 

8 sequence without priming amplification from the complement of 

9 the mutant 2C19 sequence (SEQ. ID. No. 61) shown in Fig. 16; 

10 (b) a first reverse primer having: 

11 about 10-50 contiguous nucleotides from 

12 the complement of the wildtype 2C19 sequence' (SEQ. ID. No. 51) 

13 shown in Fig. 16 including the complement to nucleotide 681 of 

14 the coding region; 

„ 15 wherein the first reverse primer primes 

16 amplification from the wildtype 2C19 sequence without priming 

• 17 amplification from the mutant 2C19 sequence shown in Fig. 16; 

18 (c) a second forward primer having: 

19 about 10-50 contiguous nucleotides from 

20 the mutant 2C19 sequence (SEQ. ID. No. 61) shown in Fig. 16 

21 including the nucleotide at position 681 of the coding sequence, 
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22 wherein the second forward primer primes 

2 3 amplification from the complement of the mutant 2C19 sequence 

24 without priming amplification from the complement of the 

25 wildtype 2C19 sequence (SEQ. ID. No. 51) shown in Fig 16; and 

2 6 (d) a second reverse primer having: 

27 about 10-50 contiguous nucleotides from 

28 the complement of the mutant 2C19 sequence (SEQ. ID. No. 61) 

29 shown in Fig. 16 including the complement to nucleotide 681 of 

3 0 the coding region; 

31 wherein the second reverse primer primes 

32 amplification from the mutant 2C19 sequence without priming 

33 amplification from the wildtype 2C19 sequence (SEQ. ID. 

34 No. 51) shown in Fig. 16 

35 (e) a third forward primer having: 

36 about 10-50 contiguous nucleotides from 

37 the wildtype 2C19 sequence (SEQ. ID. No. 52) shown in Fig. 17 

38 including the nucleotide at position 63 6 of the coding region; 

39 wherein the first forward primer primes 

40 amplification from the complement of the wildtype 2C19 

41 sequence without priming amplification from the complement of 

42 the mutant 2C19 sequence (SEQ. ID. No. 54) shown in Fig. 17; 

4 3 (f) a third reverse primer having: 

44 about 10-50 contiguous nucleotides from 

45 the complement of the wildtype 2C19 sequence (SEQ. ID. No. 52) 

46 shown in Fig. 17 including the complement to nucleotide 636 of 

47 the coding region; 

48 wherein the first reverse primer primes 

49 amplif ication from the wildtype 2C19 sequence without priming 

50 amplification from the mutant 2C19 sequence (SEQ. ID. No. 54) 

51 shown in Fig. 17; 

52 (g) a fourth forward primer having: 

53 about 10-50 contiguous nucleotides from 

54 the mutant 2C19 sequence (SEQ. ID. No. 54) shown in Fig. 17 

55 including the nucleotide at position 63 6 of the coding 

56 sequence, 

57 wherein the second forward primer primes 

58 amplification- from the complement of the mutant 2C19 sequence 
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59 without priming amplification from the complement of the 

60 wildtype 2C19 sequence (SEQ. ID. No* 52) shown in Fig 17; and 

61 (h) a fourth reverse primer having: 

62 about 10-50 contiguous nucleotides from 

63 the complement of the mutant 2C19 sequence (SEQ. ID, No. 54) 

64 shown in Fig. 17 including the complement to nucleotide 681 of 

65 the coding region; 

66 wherein the fourth reverse primer primes 

67 amplification from the mutant 2C19 sequence without priming 

68 amplification from the wildtype 2C19 sequence (SEQ. ID. 

69 No. 52) shown in Fig. 17. 
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GTTCATTCAA 
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TTGCTGACTA 
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2c 












2c8 












25 












65 












29c 


GTCACAGTCA 


GAGTCAGAAT 


CACAGGTGGA 


TTAGTAGGGA 


GTGTTATAAA 


6b 












11a 













-51 -1 
2c AG TGAAAGCCCG CAGTTGTCTT ACTAAGAAGA GAAG . CTTCA 

2c8 A 

25 GA GAAGGCTTCA 

65 GAAGGCTTCA 

2 9c AGCCTTGAAG TGAAAGCCCG CAGTTGTCTT ACTAAGAAGA GAAGCCTTCA 
6b AG TGAAAGCCCG CAGTTGTCTT ACTAAGAAGA GAAGCCTTCA 

Ila CTTCA 



2c 


ATGGAtcCt . 


tTGTGGtcCT 


_ GTGCTcTGT 




TC 


TCaTgTt 


TGcTTCTCcT 


2c8 


ATGGAACCTT 


TTGTGGTCCT 


GGTGCTGTGT 


\^ 


- s^. 


rt^ ^* rr*> T* * i * ~p\ 


TGCTTCTCTT 


25 


ATGGATTCTC 


TTGTGGTCCT 


TGTGCTCTGT 


\_» 


m 1 


TCATGTT 


TGCTTCTCCT 


55 


ATGGATTCTC 


TTGTGGTCCT 


TGTGCTCTGT 
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TGCTTCTCCT 
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AT G GAT CC AG 


it* /~~* f* r*-» tt* 


GGTGCTGTGT 






m m m rr* 
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TGTTTCTCCT 
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Of 0\J 




100 


2c 


TTCAcTCTGG 


AGaCAGAGCT 


cTgGgAGAgG 


. Aa .CTCCCt 


cCTGGCCCCA 


2c8 


TTCACTCTGG 


AGACAGAGCT 


GTAGGAGAAG 


GAAGCTCCCT 




25 


TTCACTCTGG 


AGACAGAGCT 


CTGGGAGAGG 


AAAACTCCCT 


CCTGGCCCCA 


65 


rp ry> f~* "»• m /-» rp 

-i. -L. \_< X \^ J. O \J 


AGACAGAGCT 


CTGGGAGAGG 


AAAAC7CCCT 


CCTGGCCCCA 


29c 


TTCACTCTGG 


AGGCAGAGCT 


CTGGAAGAGG 


GAGGCTCCCG 


TCTGGCCCCA 


6b 


TTCACTCTGG 


AGGCAGAGCT 


CTGGAAGAGG 


GAGGCTCCCG 


TCTGGCCCCA 


- j.a 


TTCAATCTGG 


AGACAGAGCT 


CTGGGAGAGG 


AAAACTCCCT 


CCTGGCCCCA 




151 








200 ' 


2c 


aGCAAATCcT 


TaACCAAT. T 


CTCAAAagTC 


TATGGcCCTG 


TGTTCACt . T 


2c8 


TGCAAATCTT 


TCACCAATTT 


CTCAAAAGTC 


TATGGTCCTG 


TGTTCACCGT 


25 


AGCAAATCCT 


TAAC CAATCT 


CTCAAAGGTC 


TATGGCCCTG 


TGTTCACTCT 




AGC - AATCC^ 

n \j Vj » u n_ii j. v , — . 


TAAP^AATP™ 


PTPAAAHf^TP 


_ .tt. X UuLLL ± O 


X o X X X -l. 


2 9c 


AGCAAATCCT 


TAACCAATTT 


CTCAAAAGTC 


TATGGCCCTG 


TGTTCACTGT 


ob 


AGCAAATCCT 


TAACCAATTT 


CTCAAAAGTC 


TATGGCCCTG 


TGTTCACTGT 


Ila 


AGCAAATCCT 


TAAC CAATCT 


CTCAAAAATC 


TATGGCCCTG 


TGTTCACTCT 




201 








250 


2c 


GTATTTTGGC 


cTGaAaCcCA 


TaGTGGTG. T 


gCATGGATAT 


GAaGcaGTGA 


2c8 


GTATTTTGGC 


ATGAATCCCA 


TAGTGGTGTT 


TCATGGATAT 


GAGGCAGTGA 


25 


GTATTTTGGC 


CTGAAACCCA 


TAGTGGTGCT 


GCATGGATAT 


GAAGCAGTGA 


65 


W X rl X X X X \J O >s- » 




X .rtO X \JO X 


w r\ x inl 




29c 


GTATTTTGGC 


CTGAAGCCCA 


TTGTGGTGTT 


GCATGGATAT 


GAAGCAGTGA 


11a 


GTATTTTGGC 


CTGGAACGCA 


TGGTGGTGCT 


GCATGGATAT 


GAAGTGGTGA 




251 








300 


2c 


AGGAaGCCCT 


G ATT GAT c . T 


GGAGAGGAGT 


TTTCTGGAAG 


AGGca . TTt c 


2c3 


AGG*-iAGCCCT 


GAT TG AT AAT 


GGAGAGGAGT 


TTTCTGGAAG 


AGGCAATTCC 


25 


AGGAAGCCCT 


GATTGATCTT 


GGAGAGGAGT 


TTTCTGGAAG 


AGGCATTTTC 


65 


AGGAAGCCCT 


GATTGATCTT 


GGAGAGGAGT 


TTTCTGGAAG 


AGGCATTTTC 


29c 


AGGAGGCCCT 


GATTGATCAT 


GGAGAGGAGT 


TTTCTGGAAG 


AGGAAGTTTT 


6b 


AGGAGGCCCT 


GATTGATCAT 


GGAGAGGAGT 


TTTCTGGAAG 


AGGAAGTTTT 


Ila 


AGGAAGCCCT 


GATTGATCTT 


GGAGAGGAGT 


TTTCTGGAAG 


AGGCCATTTC 




301 








350 


2c 


CCAcTggCTg 


AAAg Ag . TAa 


cA.AGGA.TT 


GGAATcgTTT 


tCAGCAATGG 


2c8 


CCAATATCTC 


AAAGAATTAC 


TAAAGGACTT 


GGAATCATTT 


CCAGCAATGG 


25 


CCACTGGCTG 


AAAGAGCTAA 


CAGAGGATTT 


GGAATTGTTT 


TCAGCAATGG 


55 


^ r * n ^ rr> 

ulrx x oov^ x o 


AAAG AG C TAA 


CAGAGGATTT 


GGAATTGTTT 


TCAGCAATGG 


29c 


CCAGTGGCTG 


AAAuAAGTTAA 


CAAAGGACTT 


GGAATCGTTT 


TCAGCAATGG 


5b 


CCAGTGGCTG 


AAAAAGTTAA 


CAAAGGACTT 


GGAATCCTTT 


TCAGCAATGG 


lia 


CCA.CTGGCTG 


AAAGAGCTAA 


CAGAGGATTT 


GGAATCGTTT 


TCAGCAATGG 
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400 


2 c 


AAA G A g AT G G 


AAGGAGATCC 


GGCGTTTCTc 


CCTCAt gAcg 


cTGCGGAATT 


2c8 


AAAGAGATGG 


AAGGAGATCC 


GGCGTTTC x C 


C C T C A C AAA^ 


TTGCGGAATT 


25 


AAAGAAATGG 


AAGGAGATCC 


GGCGTTTCTC 


CCTCATGA^'o 


CTGCGGAATT 


65 


AAAGAAAT GG 


AAGGAGATCC 


GGCGTTTCTC 


CCTCATGACG 


CTGCGGAATT 


2 9c 


AAAGAGATGG 


AAGGAGATCC 


GGCGTTTCTG 


CCTCATGACT 


CTGCGGAATT 


DO 


AAAGAGATGG 


AAGGAGATCC 


GGCGTTTCTG 


CO 1 LAI GAL x 


C i GCoGAATT 


i la 


AAAGAGATGG 


AAGGAGATC^ 


GGCGTTTCTC 


CCTCATGACG 


CTGCGGAATT 




4 01 








450 


2c 


TTGGGATGGG 


GAAGAGGAGC 


ATtGAGGACC 


GTGTTCAAGA 


GGAAGCcCgC 


2c8 


TTGGGATGGG 


GAAGAGGAGC 


ATTGAGGACC 


GTGTTCAAGA 


GGAAGCTCAC 


25 


TTGGGATGGG 


GAAGAGGAGC 


ATTGAGGACC 


GRGRRCAAGA 


GGAAGCCCGC 


65 


TTGGGATGGG 


GAAGAGGAGC 


ATTGAGGACC 


GTGTTCAAGA 


GGAAGCCCGC 


29c 


TTGGGATGGG 


GAAGAGGAGC 


ATCGAGGACC 


GTGTTCAAGA 


GGAAGCCCGC 


6b 


TTGGGATGGG 


GAAGAGGAGC 


ATCGAGGACC 


GTGTTCAAGA 


GGAAGCCCGC 


ila 


TTGGGATGGG 


GAAGAGGAGC 


ATTGAGGACC 


GTGTTCAAGA 


GGAAGCCCGC 
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2c 


TGCCTTGTGG 


AGGAGTTGAG 


AAAAACCAAg 


GCcTCACCCT 


GTGATCCCAC 


2c8 


TGCCTTGTGG 


AGGAGTTGAG 


AAAAACCAAG 


GCTTCACCCT 


GTGATCCCAC 


25 


TGCCTTGTGG 


AGGAGTTGAG 


AAAAACCAAG 


GCCTCACCCT 


GTGATCCCAC 


65 


TGCCTTGTGG 


AGGAGTTGAG 


AAAAACCAAG 


GCCTCACCCT 


GTGATCCCAC 


29c 


TGCCTTGTGG 


AGGAGTTGAG 


AAAAACCAAT 


GCCTCACCCT 


GTGATCCCAC 


6b 


TGCCTTGTGG 


AGGAGTTGAG 


AAAAACCAAT 


GCCTCACCCT 


GTGATCCCAC 


11a 


TGCCTTGTGG 


AGGAGTTGAG 


AAAAACCAAG 


GCTTCACCCT 


GTGATCCCAC 




501 
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2 *"* 


TTTCATCCTG 


GGCTGTGCTC 


CCTGCAATGT 


GATCTGCTCc 


. TTaTTTTCC 


2c8 


TTTCATCCTG 


GGCTGTGCTC 


CCTGCAATGT 


GATCTGCTCC 


GTTGTTTTCC 


25 


TTTCATCCTG 


GGCTGTGCTC 


CCTGCAATGT 


GATCTGCTCC 


ATTATTTTCC 


65 


TTTCATCCTG 


GGCTGTGCTC 


CCTGCAATGT 


GATCTGCTCC 


A.TTATTTTCC 


29c 


TTTCATCCTG 


GGCTGTGCTC 


CCTGCAATGT 


GATCT'GCTCT 


GTTATTTTCC 


6b 


TTTCATCCTG 


GGCTGTGCTC 


CCTGCAATGT 


GATCTGCTCT 


GTTATTTTCC 


11a 


TTTCATCCTG 


GGCTGTGCTC 


CCTGCAATGT 


GATCTGCTCC 


ATTATTTTCC 
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600 


2c 


AtaAaCG.TT 


t. GAT TAT AAA 


GATCAG. aaT 


TTCTt AaCtT 


gATGgAAAaa 


2c8 


AGAAACGATT 


TGATTATAAA 


GATCAGAATT 


TTCTCACCCT 


GATGAAAAGA 


25 


ATAAACGTTT 


T GAT TAT AAA 


GAT C AG C AAT 


TTCTTAACTT 


AATGGAAAAG 


65 


ATAAACGTTT 


TGATTATAAA 


GATCAGCAAT 


TTCTTAACTT 


AATGGAAAAG 


2 9c 


ATGATCGATT 


TGATTATAAA 


GATCAGAGGT 


cpfTi ^ *^ T* ^t^J^ ^ T * r> rr> 


GATGGAAAAA 


6b 


ATGATCGATT 


TGATTATAAA 


GATCAGAGGT 


m rp r* »-p rr> 7\ 


GATGGAAAAA 




AGAAACGTTT 


C GAT TAT AAA 


GATCAGCAAT 




3 AT G G A AAA A 
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' 650 
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TT . AATGAAA 


A C a T C A g GAT 


TcTgAgC . cc 


CC . TGGATCC 


AG „ TcTGCAA 


2c8 


TTCAATGAAA 


ACTTCAGGAT 


TCTGAACTCC 


CCATGGATCC 


AGGTCTGCAA 


25 


TTGAATGAAA 


ACATCAAGAT 


TTTGAGCAGC 


CCCTGGATCC 


AGATCTGCAA 


65 


TTGAATGAAA 


ACATCAAGAT 


TTTGAGCAGC 


CCCTGGATCC 


AGATCTGCAA 


2 9c 


TTCAATGAAA 


ACCTCAGGAT 


-CTGAGCTd 


CCA j. GGAT'^C 


AGGTCTGCAA 


6b 


TTCAATGAAA 


ACCTCAGGAT 


TCTGAGCTCT 


CCATGGATCC 


AGGTCTGCAA 


11a 


TTGAATGAAA 


ACATCAGGAT 


TGTAAGCACC 


CCCTGGATCC 


AGATATGCAA 




651 . 








700 


2c 


TAATTT . cCt 


cct . TCATtG 


ATTattTCCC 


. GGAActCA. 


AAcAAAtTac 


2c8 


TAATTTCCCT 


CTACTCATTG 


ATTGTTTCCC 


AGGAACTCAC 


AACAAAGTGC 


25 


TAATTTTTCT 


CCTATCATTG 


ATTACTTCCC 


GGGAACTCA.C 


AACAAATTAC 


65 


TAATTTTTCT 


CCTATCATTG 


ATTACTTCCC 


GGGAACTCAC 


AACAAATTAC 


29c 


TAATTTCCCT 


GCTCTCATCG 


ATTATCT CCC 


AGGAAGTCAT 


AATAAAATAG 


6b 


T 1 ^ ^ r J 1 m rp f* rrt 


GCTCTCATCG 


ATTATCTCCC 


AGGAAGTCAT 


AATAAAATAG 


11a 


rp ^^J^*T I *y* T* 


ACTATCATTG 


ATTATTTCCC 


GGGAACCCAT 


AACAAATTAC 




701 








750 


2c 


tTaAAAA. gT 


TGCTtttAt g 


aaAAGTtAta 


TtttGGAgAa 


AgTAAAAGAA 


2c8 


TTAAAAATGT 


TGCTCTTACA 


CGAAGTTACA 


TTAGGGAGAA 


AGTAAAAGAA 


25 


TTAAAAACGT 


TGCTTTTATG 


AAAAGTTATA 


TTTTGGAAAA 


AGTAAAAGAA 


65 


TTAAAAACGT 


TGCTTTTATG 


AAAAGTTATA 


TTTTGGAAAA 


AGTAAAAGAA 


29c 


CTGAAAATTT 


TGCTTACATT 


AAAAGTTATG 


TATTGGAGAG 


AATAAAAGAA 


6b 


CTGAAAATTT 


TGCTTACATT 


AAAAGTTATG 


TATTGGAGAG 


AATAAAAGAA 


11a 


TTAAAAACCT 


TGCTTTTATG 


GAAAGTGATA 


TTTTGGAGAA 


AGTAAAAGAA 
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800 


2c 


CAcCAAGaAT 


Ca . TGGAcaT 


gAACAa. cCT 


CgGGACTTTA 


TtGATTGcTT 


2c8 


CACCAAGCAT 


CACTGGATGT 


TAACAATCCT 


CGGGACTTTA 


TGGATTGCTT 


25 


CACCAAGAAT 


CAATGGACAT 


GAACAACCCT 


CAGGACTTTA 


TTGATTGCTT 


65 


CACCAAGAAT 


CAATGGACAT 


GAACAACCCT 


CAGGACTTTA 


TTGATTGCTT 


29c 


CATCAAGAAT 


CCCTGGACAT 


GAACAGTGCT 


CGGGACTTTA 


TTGATTGTTT 


6b 


CATCAAGAAT 


CCCTGGACAT 


GAACAGTGCT 


CGGGACTTTA 


TTGATTGTTT 


Ila 


CACCAAGAAT 


CGATGGACAT 


CAACAACCCT 


CGGGACTTTA 


TTGATTGCTT 




801 








850 


2c 


CCTGATcAAA 


ATGGAg . AGG 


AAAAGcAcAA 


cCAAcagTCt 


GAATTt AcTa 


2c8 


CCTGATCAAA 


ATGGAGCAGG 


AAAAGGACAA 


C C AAA AG T C A 


GAATTCAATA 


25 


CCTGATGAAA 


ATGGAGAAGG 


AAAAGCACAA 


CCAACCATCT 


GAATTT ACT A 


65 


CCTGATGAAA 


ATGGAGAAGG 


AAAAGCACAA 


CCAACCATC.T 


GAATTT ACT A 


29c 


CCTGATCAAA 


ATGGAACAGG 


A AAA G C A C AA 


TCAACAGTCT 


GAATTTACTG 


ob 


CCTGATCAAA 


ATGG AAC AG G 


AAAAGCACAA 


TCAACAGTCT 


GAATTT ACT G 


Ila 


CCTGATCAAA 


ATGGAGAAGG 


AAAA G C A AAA 


CCAACAGTCT 
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Tgt TTGgaGC 


TGG . ACAGAG 


— o 




GGTTGGCACT 


GTAGCTGATC 


TATTTGTTGC 


TGGAACAGAG 




TTr;AAAr;r fTlfr ' 




gp AHT^^ArT 

o^nu x x unw x 


TGTT T GGAGC 


^GGGACAGAG 




x x oniinoL j. - 


^ ^ 7\ a t\ n^cv 

o\Jri/\nn.L-nL x 


j^rtb J. 1 J. 


t t t r: r; a r 




2 9c 


TTGAAAGCTT 


GATAGCCACT 


GTAACTGATA 


TGTTTGGGGC 


TGGAACAGAG 


6b 


TTGAAAGCTT 


GATAGCCACT 


GTAACTGATA 


TGTTTGGGGC 


TGGAACAGAG 


11a 


TTGAAAACTT 


GGTAATCACT 


GCAGCTGACT 


TACTTGGAGC 


TGGGACAGAG 




901 








950 


2c 


ACaACaAGCA 


C .AC . CTGAG 


ATATG. . CTC 


CT.CTCCTGC 


TGAAGcACCC 


2c 8 


ACAACAAGCA 


CCACTCTGAG 


ATATGGACTC 


CTGCTCCTGC 


TGAAGCACCC 


25 


ACGACAAGCA 


CAACCCTGAG 


ATATGCTCTC 


CTTCTCCTGC 


TGAAGCACCC 


65 


ACGACAAGCA 


CAACCCTGAG 


ATATGCTCTC 


CTTCTCCTGC 


TGAAGCACCC 


29c 


ACAACGAGCA 


CCACTCTGAG 


ATATGGACTC 


CTGCTCCTGC 


TGAAGTACCC 


6b 


ACAACGAGCA 


CCACTCTGAG 


ATATGGACTC 


CTGCTCCTGC 


TGAAGTACCC 


11a 


ACAACAAGCA 


CAACCCTGAG 


ATATGCTCTC 


CTTCTCCTGC 


TGAAGCACCC 



951 1000 
2c AGAGGTCACA GCTAAAGTCC AGGAAGAGAT TGAacgTGTa aTTGGCAGAa 

2c8 AGAGGTCACA GCTAAAGTCC. AGGAAGAGAT TGATCATGTA ATTGGCAGAC 
2 5 AGAGGTCACA GCTAAAGTCC AGGAAGAGAT TGAACGTGTG ATTGGCAGAA 
65 AGAGGTCACA GCTAAAGTCC AGGAAGAGAT TGAACGTGTG ATTGGCAGAA 

2 9c AGAGGTCACA GCTAAAGTCC AGGAAGAGAT TGAATGTGTA GTTGGCAGAA 
6b AGAGGTCACA GCTAAAGTCC AGGAAGAGAT TGAATGTGTA GTTGGCAGAA 

11a AGAGGTCACA GCTAAAGTCC AGGAAGAGAT TGAACGTGTG ATTGGCAGAA 



1001 1050 

2c ACcGGAGCCC CTGcATGCAc GAcAGGaGcC AC AT GC C c T A CACaGATGCT 

2c8 ACAGGAGCCC CTGCATGCAG GATAGGAGCC ACATGCCTTA CACTGATGCT 

2 5 ACCGGAGCCC CTGCATGCAA GACAGGAGCC ACATGCCCTA CACAGATGCT 

65 ACCGGAGCCC CTGCATGCAA GACAGGAGCC ACATGCCCTA CACAGATGCT 

2 9c ACCGGAGCCC CTGTATGCAG GACAGGAGTC ACATGCCCTA CACAGATGCT 

6b ACCGGAGCCC CTGTATGCAG GACAGGAGTC ACATGCCCTA CACAGATGCT 

11a ACCGGAGCCC CTGCATGCAG GACAGGGGCC ACATGCCCTA CACAGATGCT 





1051 










1100 


2c 


GTgGTGCACG 


AG . TCCAGAG 


ATACattGAC 


X 


.cTCCCCA 


CCagccTGCC 


2c8 


GTAGTGCACG 


AGATCCAGAG 


ATACAGTGAC 




TGTCCCCA 


CCGGTGTGCC 


25 


GTGGTGCACo 


AGGT CC AG AG 


--iTACCTTGAC 




TCTCCCCA 


CCAGCCTGCC 


55 


GTGGTGCACG 


AGGTCCAGAG 


ATACATTGAC 




TCTCCCCA 


CCAGCCTGCC 


29c 


GTGGTGCACG 


AGATCCAGAG 


ATACATTGAC 




CCTCCCCA 


CCAACCTGCC 


6b 




AGATCCAGAG 


ATACATTGAC 




i /-y p»rp *~> > 


CCAACCTGCC 


— ^a 


t\ r~* r-ri /-» 


- s-* f> T, /~*> S "\ 

.— _ o x w w ~+ \^ r\ 'O 


ATACA7CGAC 




' 21 ""PC *~* ^ ^ 


CCAGCCTGCC 
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p p ^ t p i — ^PTv^ 


ACCt~TGA . 
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AAACTAcCTC 


AT CPP AAPP 
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^ ^^^T CP \ . riGT G 
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CTAAGTTCAG 


AAACTACCTC 


^TCCCCAAGG 


25 


CCATGCAGTG 


ACCTGTGACA 


TT7VAATTCAG 


AAACTATCTC 


ATTCCCAAGG 


55 


CCATGCAGTG 


ACCTGTGACA 


TTAAATTCAG 


AAACTATCTC 


ATTCCCAAGG 


29c 


CCATGCAGTG 


ACCTGTGATG 


T ^ AAAT T CAA 


AAACTACCTC 


ATCCCC AAGG 


5b 


CCATGCAGTG 


ACCTGTGATG 


TTAAATTCAA 


AAACTACCTC 


ATCCCC AAGG 


11a 


CCATGCAGTG 

V* V_>.*i X N — > *iVJ X 


ACCT-TGACP 

XX V_-» V^ _L v.. X V_J XX V - VJ 


T T AA AT T C AG 


AAACTACCTC 


z.TT^'^P AAPP 
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x z. u w 


<_ v- 


CfAfn APPAT 


n . l a^L . ^ 


L- a. J. XwL-O 


TppT^PAt-rrA 




? oft 




1 vr vj v-rt 1 l/\ 


PTr , 2\r'T rr, ppp 


THPTZlPATPTi 




25 


GCACAACCAT 


ATTAATTTCC 


CTGACTTCTG 


TGCTACATGA 


CAA CAA AG A A 


■5 5 


G C AC AAC C AT 


ATTAATTTCC 


CTGACTTCTG 


TGCTACATGA 


CAACAAAGAA 


'/ O /-> 


vj Vw- oriU v.. j-*. J. 




^ 1 j. x ^ x o 


TTPTrr* ATTl Zx 
X bl i OL,nL/in 


x •oACx^-iiH.ort.H 
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1951 2000 
6b GAAAGGTAAG AGGGTAGGAA AGCTGTTTTA GCTAAATGCC ACCTAGAGTT 

2001 2050' 
6b ATTGGAGGTC TGAATTTGGA AAAAAAAACT ATGTCCAGGA GAACATTAAG 

2101 2150 
6b TGTTTGAATT CATGCTCTGG 7TTTGTGTTA CTGTAAACAC AAGATCAAGA 

2151 2200 
6b TTTGGATAAT CTTTTTGCTT TGTGTTTCCA ACTTAGATCA TGTCT AAATA 

2201 2216 
6b TATGCTTTCA TATGGC 
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ATGGTGATGT AGnAAnTCAT nCCATCTTAT ATTTCnAGAG TGTAGAGGAG 

GATTGTTGnG GAAGTAAGAG GnnTAAGATA GAGATGCnTT TATACTATCC 

CAAGCAGGGA TrAGTCTAGG AAATGATTaT CGTCtt TGAT TCTCTTGTCA 

GrAttTTCTT TCTCmnATCT TGt ATAATCA GAGaatTACT ACACATGgAC 

AATrAarATT TCCCCnTCcA GAtAnACaAt ATATTTTATT TATATTTATA 

GTTTTAAATT ACAACCAGAG CTTGGCATAT TGTATCTATA CCTTTAATAA 

INTRON 4 I EXON 5. 

ATGCTTTTAA TTTAATAAAT TATTGTTTTC TCTTAGjATAT GCAATAATTT 

A 

TCCCACTATC ATTGATTATT TCCCGGGAAC C C AT AAC AAA TTACTTAAAA 

'681 

ACCTTGCTTT TATGGAAAGT GATATTTTGG AG AAAG T AAA AGAACACCAA 

GAATCGATGG ACATCAACAA CCCTCGGGAC TTTATTGATT GCTTGCTGAT 
I INTRON 5 

CAAAATGGAG AAGjGTAAAAT GTTAACAAAA GCTTAGTTAT GTGACTGCTT 

GCGTATkTGT GATTCATTGA. CTAGTTGkGT GTTTACTACG GATGTTTAAC 

AGGTCAAGGA GTAATGCTTG AG AAG C AT AT TTAAGTTTTt ATTGTaTGCA 

TGAATATCCA GTAAGCATCA TAGAAAATGT AAAATT AAn T TGtTAaATAa 

TTAGAaTACA TAGAAGAAAT t GTTt AGATA AATATnATCT ATCTGAACAA 

TAAGGATGTC AGGATAGGAA AAGCTCTGTT TCTGCAGCTT CCAGTGGAGA 
TCAGCACAGG AGGGAACTTA TTTTTT 

FIG. 16. 
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agggaaaagacaaataggccggggargnaaatttagcatgtgagcaacc wt 

* ttanttaaccagctaggctgtaattgntaattcgagantaatgtnaaagt wt 

* gatgtgttgattttatgcatgccnnactcntttttgcttttaaggggagt wt 

cataggtaagatattacttaaaatttctaaactat tattat ctattaact wt 

aatataa aatattttatatctaatgtttactcatattttaaaattgtttc wt 
1 I II 1 I I I I 1 t I I M I I I II 1 1 I I I I I I I I I 1 1 I I I II 1 I I I I M I I 
atgaagtgttttatatctaatgtttactcatatttt aaaattgtttc mutant 

SerProCysAspProThrPhelleLeuGlyCysAlaP 
caat catttagCTTCACCCTGTGATCCCACTTTCATCCTGGGCTGTGCTC wt 
I 1 I M 1 M M I I I I II I I I ! M M II I II I I I II I I II I I I I I I I II I II 
caat cat ttagCTTCACCCTGTGATCCCACTTTCATCCTGGGCTGTGCTC mutant 
"482 

roCysAsnVallleCysSerllellePheGlnLysArgPheAspTyrLys 
CCTGCAATGTGATCTGCTCCATTATTTTCCAGAAACGTTTCGATTATAAA wt 
M I I I I I II I I I I I II II I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I 
CCTGCAATGTGATCTGCTCCATTATTTTCCAGAAACGTTTCGATTATAAA mutant 
[His] 

AspFlnGlnPheLewAsnLewMetGluLysLeuAsnGluAsnlleArgll 
GATCAGCAATTTCTTAACTTGATGGAAAAATTGAATGAAAACATCAGGAT wt 
I I I I II I I I I II I I I I II I! I I I I I I I II I I I I I II I I I I I II I I I I M I ; 
GATCAGCAATTTCTTAACTTGATGGAAAAATTGAATGAAAACATCAGGAT mut ant 

eValSerThrProTrpIleGln 

TGTAAGCACCCCCTGGATCCAGgtaaggacaagttttgtgcttcctgaga wt 
I I I I I II I I I I I I M I I I I I I I I I I 1 I I I I I I I I II I I II I I I I I I I I I 
TGTAAGCACCCCCTGAATCCAGgtaaggacaagttttgtgcttcctgaga mutant 
End "64 2 

aaccacttacaatctttttttctgaaaaatccaaaattcta tattaacca wt 
I I 1 I I I I I I I I I I II I I I I I I 1 II 1 I I I I I I I I I I I I I I I I I I I I 
'* aaccacttacagtctttttttctgggaaatccaaaattctatatt mutant 

) aaccrtaaagt acatttgtgaatactacagtcttgcctagacagccatggggt wt 

gaatatctggaaaagatggcaaagntctttattttatgcacaggaaatgaata wt 

tcccaatataaatcagactrctaaacccattagctccctgatcagcattt wt 

FIG. 17. 
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